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APPLICATION OF A LARGE SAMPLING CRITERION TO SOME 
SAMPLING PROBLEMS IN FACTOR ANALYSIS* 


Daye D. Ripre 
OPERATIONS ANALYSIS OFFICE, STRATEGIC AIR COMMAND 


A technique is presented to test the completeness of factor solutions 
and also to test the significance of common-component loadings. The chi- 
square test involved is based upon the asymptotic normal properties of the 


residuals. 


1. Introduction. Consider a k-variate universe in the variables y; and 
a random sample of size N from this universe. Denote 


1 N 
qi = N p> Yia ) (1) 
and consider the new variable 


GH=4-4 - (2) 


Then form the covariances of these variables in the usual way, 


= > iit. (n = N — 1). (3) 
a=1 
The various techniques of factor analysis mathematically form a linear 
transformation of the k variables z; into the form 


L; = nF, + a.F, + 4;.F, + aU; , (4) 


where the F; are called common components and the U; are called unique 
components. If the new variables are selected such that their sample means 
are zero and sample variances are unity, and further such that the unique 
components are not correlated among themselves nor with the common 
components, then in terms of these new variables, (3) becomes 


= bh QipQ jp + 2 a (4;p0jq + jpDia) ra + 6; ;4;0; ’ (5) 


p=1 q=p+l1 


where 
N 
a ee (6) 


*The research work on which the results presented are based was conducted under 
the supervision of Prof. P. S. Dwyer, Mathematics Department, University of Michigan. 
The complete results of this research were presented in a Ph. D. thesis, June, 1951. 
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and where 6,; is the Kronecker delta. Finally, when the common components 
are chosen in such a way that r,, is zero, then (5) becomes 


™;; = yy AipAjp ot 6;;4;0; ° (7) 
p=1 
The same technique may be applied to the standard deviates instead 
of the deviates of (2); that is, to 


z= Yi — Yi 
‘ Vm; 
In this case sample correlation coefficients rather than the sample covariances 
are involved. The theory developed in the next sections applies to either the 
correlation coefficients or the covariances. Maintaining the covariance 
symbol defined in (3) emphasizes the generality of the method. 
Using a matrix notation: 


m = (m;;) (k X k covariance matrix), 
ay, = (4a;;) (k X s common component 
loading matrix), (8) 
a, = (a,) (k Xk diagonal unique 


component loading matrix), 
the matrix equation for (7) is 
ome , , 
m= a,,0,, + 4,0, , (9) 


where the primes indicate matrix transposes. 
The matrix 


a = , 
M — A,Q), = Ay405y (10) 


is then actually factored by this technique, and thus the idea of matrix 
factorization is introduced into the problem. The number of linearly inde- 
pendent columns, s, in a,, is equal to the rank of m — a,a,y . In order to 
satisfy (10) within rounding error, it will usually be required that s = k. 
However, the quantities m;; are subject to sampling variation, and it is 
desirable to carry the component solution only to the point that (10) is 
satisfied within that variation. In other words, if the residual matrix is de- 
noted 
(m —~ GL) — GgsMcs » 

the solution should be carried only to the point that this residual matrix is 
zero Within possible sampling variation. Any components obtained beyond 
this point will be regarded as insignificant. 

A number of different tests have been given for determining the number 
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of significant components. In most cases they are designed for particular 
types of factor solutions. Among the writers who have proposed tests are 
Coombs (2), Hoel (5, 6), Holzinger and Harman (7), Hotelling (8), and 
Lawley (9, 11). In this paper a technique comparable to that proposed by 
Lawley (9) is developed with a slightly different interpretation concerning 
the basis for the significance test, in order to provide a wider range of appli- 
cation. The theory goes back to the sampling distribution of the covariances, 
and, in fact, as special cases large sampling tests for the significance of various 
correlation coefficients can be obtained. 


2. Interpretation of the Sampling Problem and Development of the Large 
Sampling Criterion for Completeness of Factorization. In order to arrive at a 
test for the significance of components in matrix factorization, it is first 
necessary to define a significant component. Consider an arbitrary orthogonal 
matrix factorization of the sample covariance matrix which results in (10) 
or equivalently in (7) for all 2, 7. Define a significant component in the follow- 
ing way: If it is hypothesized that 


hii = p ® AipQjp a 6; ;4;a; ’ (11) 
p=1 


and if the sample covariance matrix can be regarded as the covariance matrix 
of a random sample from the population whose covariance matrix has as a 
typical element u;; , then there are only s significant common components 
present in the sample covariance matrix. This definition of significant com- 
ponent fits quite well into the picture of the general usage of factor analysis. 
That is, one deals with a sample covariance (or correlation) matrix in the 
factor process, and the interpretation of the results is usually in terms of 
the population from which the random sample is assumed to have been 
drawn. Hence, the hypothesis of (11) is inherent in the interpretation of 
the results whether or not the sampling criterion is applied. 

In order to provide the sampling formula required, assume that the 
original variables have a joint k-variate normal distribution whose probability 
density function is 


Fi, Bey Re, yp ®) 


l I oo 
= ape La 2 E Eon} 


where y is the covariance matrix, | » | is the determinant of this matrix, 
and py? is the 7jth element in the matrix inverse to ». Although the assumption 
of normality is rather restrictive in nature, it appears that much of the validity 
and significance of factor processes rests rather directly on this assumption. 


(12) 
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With the underlying normality, it follows that the m,; of (3) have a Wishart 
distribution (14, Sect. 11.1) whose probability density function is 


Bs nk/2 
(2) ar | ma [exp | — 2 Tame 
phen TT p oosty 
t=1 





Wil; ;mm'’) - ’ (13) 


where A = | "|. 


The m,; defined in (3) are unbiased estimates of u;; . That they are 
also maximum-likelihood estimates of u;; may be verified by obtaining the 
maximum points of the function 


log W,.. = log K + 5 log A — 5 Daim, (14) 


where K is independent of all u;; . By taking the partial derivative of this 
function with respect to u™ for all p, q and setting the resulting equations 
equal to zero, we obtain the maximum-likelihood estimates 


Mpa = Moa m (15) 
Now the $k(k + 1) quantities (m;, — u;,) are such that 
1 
(mip = ip = 0 — 16 
Hip) VN (16) 


(the usual order notation is used with the implication in this case that 
~/N(m;, — ui») remains finite as N becomes indefinitely large) and have a 
joint normal distribution in the limit with zero means and covariance matrix 


(E.5.00) > where 
049 log wt . 
= -E, Ou i,PM; 4 . (17) 


In (17) the expectation is over the entire sample space with the subscript 
p’ indicating that the parameters u;; are fixed at the true population values. 
It should be remarked that (16), (17), and the above remarks are valid if 
some weak restrictions on the original distribution hold (13, Sect. 44). For 
the Wishart distribution (which is indeed well behaved!) these restrictions 
do hold. It follows at once that the exponent 


A a > § 5; val Mip = Hiv) (Mig = Mia) (18) 


i<p 
i<q 


has a chi-square distribution with degrees of freedom equal to $k(k + 1) 
minus the number of independent linear restrictions among the variables 
(m,, — #s,), Or the number of linearly independent variables (m,;, — 4u,,). 
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In order to obtain a direct way for calculating \, consider the Taylor 
series expansion with remainder, R; , of log W,,, about the point u;; = m,; 


in the 3k(k + 1)-dimensional parameter space. Thus 


log W = log Whass-mes + Do (ues — mii) o log WF | 


Oni; 
1 a” log | 
+ 2! X (Lip Min) (Mia Mj) Oui pPje ieee 
hh. (19) 


It can be shown that 


Kk; =O 


$I 


and 

3” log | 1 
iw — 0 - 
é Opis Inijome; . VN 


Thus for large sample approximations, R; may be dropped and £é;;,,, may 
be used in the third term on the right of (19). Further the second term on 
the right is zero since all of the first partial derivatives are zero at the maxi- 
mum point. Hence, using (18) and (14), \ becomes 





A= n{ log | ws; | — log | m; | + do uim:; — e} (20) 


where \ has a chi-square distribution with degrees of freedom as noted 
earlier. This same expression was obtained by Lawley (9, Sect. 6) for his 
maximum-likelihood technique of factor analysis. The more general develop- 
ment presented above yields a much wider range of application. The number 
of degrees of freedom associated with (20) when applied to the more general 
problem (regardless of method of factor solution) is greater than that given 
by Lawley. This reflects the fact that the estimates of the component loadings 
in any general method of component analysis are not efficient estimates. 
The power of the resulting test is then lower than that of the test furnished 
by Lawley. In the next paragraph the theory is developed for fixing the 
degrees of freedom for the general problem. 

The expression m,;; — w,; (where y;; is as defined in [11]) is commonly 
called a residual. The hypothesized covariances and residuals after 
s(0 < s < k) common components have been analyzed will be denoted 


(a) ahi; = > A jpQip + 6; ;4,a; ’ (21) 
p=1 


(b) ii = Mi — Mii - (22) 
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From (7), with the upper limit of summation equal to x, 
k 
ny = >, Qin jp + 5; ;0;0; « (23) 
p=1 


Then using (23) in (22), we have 
k 


oT — z. AipAjp ° (24) 


p=s+l1 


The matrix of all ,r;; may be written: 


Qis+1 QAret+2 °°° Ayy Qis+1 Goser °°° Aks+i 
Qas+1 Qos+2 ‘*** Oak Qis+2 asta °°* kere 
(73;) = . (24a) 
LQis+1 Qrst+2 °°* Agertl iy Ax ce 














Thus (,r;;) is a k by k — s matrix times its transpose; and hence, since the 
columns of this matrix are linearly independent, it follows that the rank of 
(,7,;) is exactly k — s. This implies that we can find k — s rows (or columns) 
in (,r;;) and express all other rows (or columns) as linear combinations of 
these. Further, the sub-matrix consisting of these rows and columns is sym- 
metric, and hence has 3(k — s)(k — s + 1) different elements. Thus, the 
original matrix (,r,;) has exactly 4(k — s)(k — s + 1) linearly independent 
elements, and hence the number of degrees of freedom for the chi-square 
test of (20) after s common components are removed is 


df. = 34(k — s\(k —s+ 1). (25) 

With respect to the computations required in the application of (20), 

it should be noted that for large k, the amount of labor required may be 

considerable. However, the quantities | u;; | and (u'’) can be computed 

using the Dwyer-Guttman technique (3, 4). For cases in which s is con- 

siderably less than k this technique effects a considerable saving in the 
computations required, and does make the application of (20) feasible. 

Example: Assume a sample correlation matrix of the form: 

| 1.0000 .5710 = .7330—.6835 

1.0000 .7812  .7514 

1.0000 .8842 


L 1.0000 


* * * 


* 
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(Symmetric elements are indicated with the asterisk.) Assume that this 
was obtained in a random sample of size 100. A complete and arbitrary 
factor solution is 





1.70 30 10 }# .05| 

50 60 40 °&#.02 | 

ay, = | 

80 50 ~~ .20 06 | 
Ls ©: 2 wi, 

and 

[4075 0 0 0 | 

: 0 22906 0 0 

a,a, = 


0 0 .0664 0 
~ & 0 0 13511. 


The completeness of the factor solution after two and then after three of 
the components in order from left to right in a,, were removed was tested 
by (20). The quantities needed are: 











| m;; | = .0372. 
(wii): 
Two Components Three Components 
[2.1944 .0122 —1.3399 — .3318]| 2.1908 .0666 —1.3277 — .3732 | 
| , 3.3712 —1.3221 —1.2844 ~ 2.6912 —1.4791 — rs 
‘ ‘ 6.1926 —3.5437 : . 6.1578 —3.4235 | 

El ‘ P 5.25938), ‘ ‘ 4.8647. 
luc |: 0274 0369 
d 4.36, 3 df. 00, 1 df. 


Apparently two components completely factor the matrix, and three com- 
ponents certainly do. 


3. Sampling Variation of Component Loadings. In this section another 
sampling problem concerned with the variation in the magnitude of com- 
ponent loadings due to random sampling is studied, together with a test 
for the significance of individual component loadings. This section deals 
exclusively with orthogonal solutions, and the next section generalizes the 
concept to include sampling variability in the oblique component solutions. 

Again it is necessary to set up an underlying definition that fits the 
ordinary interpretation of the factor analysis problem and permits of mathe- 
matical expression. Assume that a complete factor solution in the sense of 
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Section 2 is available. The covariances yielded by this solution, 
aa Z, ipAip + 5;34,0; , 
p=1 


will be regarded as adequately quantizing the unknown population covari- 
ances. Define the variation, x, permissible in any component loading a,, to 
be such that if a;, + 2x is substituted for a;, in the above expression, the 
new covariances (and covariance matrix whose elements are these covariances) 
must be such that they are within a region of random sampling variability 
from y;; (and the matrix »). The sample size considered, of course, is N. 
Further the problem is specialized to hold the diagonal entries fixed. This is 
equivalent to assuming that the sum of the communalities and uniquenesses 
of each variable is held constant and implies that any change in a common 
loading must be accompanied by a change in a uniqueness component loading. 
Finally, it is assumed that the orientation of the orthogonal reference frame 
set up by the initial solution remains undisturbed throughout the remaining 
study of component loading variation. 

In keeping with the definition set forth in the preceding paragraph, 
it is desired to determine extreme values for x such that if 





a1 Qin *** Ap see Ais 
5 | a Az2 er Ap ee de; 
a, ,(2) — | 21 2 2 2 (26) 
i Aye Som Aip + x sé Ay: 
Qi1 Aro *** Akp ones Ars, 
and 
[a, O +--+ O 0 
ats) _ | 0 a «+ @ eee OC (27) 
ia 0 «++ aj —2a,,7-—2° -: 0 
ia Dees ‘os he, 


then the covariance matrix of a random sample of size N may be 


mx) = a,,(x)al,(x) + a,(x)a‘,(zx) 








| Mit Mie ee + Gino °° thu 
Mai Moe 72 Mos 4 Ot + o* Ay 
ss (28) 
il 5 Qip Mie + Aap Bai cee Mik ot Ary 
Mk Me 82 Mag t Cet =** ty = i 
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The results of the analysis of Section 2 provides a ready answer to the problem. 
Formula (20) yields a measure of how much population and sample may 
“be apart” at any desired significance level where the number of degrees of 
freedom (which in this case is k — 1, as can be seen from the second form 
of [28]) and the significance level fix 4, n = N — 1, | u;; | and (u"’) are fixed 
by the factor solution, and | m,; | and m,; contain the variable z. 

From a computational point of view, it is simpler if the component 
loading studied is in the last row of the matrix of common component loadings. 
This can be accomplished by placing the ith variable in the last position 
in the original data and accomplishing the corresponding change in the 
matrix 4. However, the general position will be maintained in the following 
development to emphasize the generality of the method. 

In the calculation of the quantities needed for the application of (20), 
it can be shown “irectly by writing out the expansion that 


yD p''m,(x) — k = 2Sz, (29) 
where 


k 
S= Diu, . (30) 
i=1 


iAt 


Further, it can be shown by straightforward algebraic reduction that 


| mii(x) | = | wes | + | wes | Se + Diz", (31) 
where 
Hi Mie ai,” ie | 
Moi Mee A2p Mok 
ef cee bet cin deosancsnsiaesis (32) 
Gy, Gs °° 0 ee | 
Mir = Mke Qkp Mik 








For factorization of correlation matrices x is small and the term D?x* may 
be omitted for an approximation. If it is to be carried for more accuracy, 
then D? may be evaluated directly or may be approximated by various 
techniques to the desired degree of accuracy. 

The expression (20) may be written with the notation introduced in 
this section and with \ transposed to the right as 


F(x) = log | wij | — log {| wis | +2 | wis | Se + Dix’) + 28x — d/n, (83) 


where d is fixed at the desired probability level for k — 1 degrees of freedom. 
The roots of f(z) = 0 are then the values of x of permissible variation in 
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the component loading a;, . There will be two such roots near x = 0 and 
opposite in sign. In order to obtain a test of the significance of the com- 
ponent loading a;, , replace x by —a,, in all of the above considerations. 
Example: Consider the example presented in Section 2 and the common 
factor solution consisting of the first three columns of a,, . In determining 
the variation possible in a,, at the 5% significance level again using the 
sample size 100, the quantities needed in the application of (33) are 


| w:; | = .0388 (| u.; | here has unity for all of the diagonal elements), 
D? = —.0574. 


Substituting these values into (33), we have 
f(x) = log .0388 — log {.0388 — .0988x — .057427} — 2.5478x — .0789. 


Roots of f(x) are 
a, =.ll, 


Ve = =k), 


and thus .55 < a,, < .81 with a 5% significance level. Approximate calcula- 
tions were also made dropping off the term D?x’. 
These yielded 
XY, = 14, 


z= — 18. 


4. Application to the Sampling Variation in Oblique Component Loadings. 
In the application of component analysis in psychology, the final form of 
the results is often presented in terms of correlated components. In terms 
of these final components we have equation (5) of Section 1 for the popu- 
lation. Since the correlations between the common components are known, 
it is possible to calculate the changes in the reproduced covariances cor- 
responding to certain changes in the component loadings in a manner similar 
to that employed in the previous section. 

In order to use a notation comparable to that of Section 3, denote the 
reproduced covariances for s common components as: 


hii = . AipAjp i i (4 ;pAjq + jp ia)T vq + 6; ;4;0; . (34) 
p=1 


a=ptl 
Now consider the amount x by which a;, can be changed and still have it 
remain within sampling variation of the actual a;, computed. To determine 
x substitute a;, + x for a,, in (34) and write the result as m,;; . Then, as in 
Section 3, holding the diagonal fixed (m;; = u;;) by adjusting the uniqueness, 
the new reproduced covariances are: 


Mii = Mig + b;,2, (J = 1, 2, ae k,j 1), (35) 
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where 
Dip = Oylr,r, + Gjorr,r, + °°* AjelP,P, - (36) 
Then the adjusted matrix m;;(x) takes on a form similar to that of (28) 
with 5;, replacing a;, . 
Now in much the same way as before define 


V = Dub. (37) 


Then approximate ranges for x are obtained by solving the equation 
2 r 
0 = log | ui; | — log {| wis | +2 | us; | Va} + 2Ve —- at (38) 


where n = N — 1 and d is fixed by the significance level with k — 1 degrees 
of freedom. To obtain exact ranges, one should carry a term comparable to 
D?x’ of Section 3 with b;, replacing a;, . Since x is small the approximation 
of (38) is quite good. Again it is possible to test the significance of a single 
component loading by simply placing x = —a,, in (38). In this case the 
criterion becomes 


A = n{log | hij | — log [| en oe | wei | Va;,] — 2Va,,}, (39) 


where \ has a chi-square distribution of k — 1 degrees of freedom. An appli- 
cation of (39) is made in the last part of the illustrative example presented 
in the next section. 


5. Illustrative Example. In order to apply the results of preceding sec- 
tions, a thirteen-variable example discussed by Holzinger and Harman is 
considered (7, 30, 189, Appendix B, and Appendix E). The variables are 
certain psychological tests which are described briefly in Appendix B, p. 309. 
The table of intercorrelations is given on p. 30, and reproduced here as Table 1. 


TABLE 1 
Intercorrelations of Thirteen Tests* 











Test 1 2 3 4 5 6 7 8 9 10 11 12 13 
1 1.000 .318 .403 .468 .321 .335 .304 .332 .326 .116 .308 .314 .489 
2 1.000 .317 .230 .285 .234 .157 .157 .195 .057 .150 .145 239 
3 1.000 .305 .247 .268 .223 .382 .184 —.075 .091 140.321 
4 1.000 227 .327 .335 .391 .325 .099 .110 .160 .327 
5 1.000 .622 .656 .578 .723 ~~ « «311 344 215 .344 
6 1.000 .722 .527 .714 «.203 + #« .353 = «6.095 ~=.309 
7 1.000 .619 .685 .246 .232 .181 345 
8 1.000 .532 .285 .300 # .271 3895 
9 1.000 .170 2806 6©6113)6| (£988 
10 1.000 .484 .585 .408 
11 1.000 .428 .535 
12 1.000 .512 
13 1.000 








*Reproduced from Holzinger and Harman (7, 30). 
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The uniquenesses were then estimated, the common covariance matrix 
was obtained, and a three-component centroid solution was calculated 
(7, 189). The results are repeated in Table 2. 


TABLE 2 
Centroid Solution for Thirteen Tests* 








Component Loadings 








Variable Uniqueness 
C; C, C3 
1 .442 .607 — .060 — .443 
2 797 355 .038 — .266 
3 .638 .418 . 148 — .429 
4 . 686 .478 .083 — .287 
5 .354 729 .257 .244 
6 359 . 707 354 . 167 
7 . 250 421 .367 .257 
8 .429 . 705 .197 .062 
9 . 242 .698 .409 . 252 
10 . 446 .455 — .482 .3899 
1] .551 .537 — .390 .145 
12 .469 .487 — .553 033 
13 401 674 — .368 — .135 





*Reproduced from Holzinger and Harman (7, 189). 


In order to test the hypothesis that this factorization reproduces the 
intercorrelations, formula (20) was applied. The quantities needed are 


| ws; | = .00412, 








t 
(u"’): 
Vari- 
able 1 2 3 4 5 6 7 8 9 10 11 12 13 
1 1.664 —.194 —.350 —.262 —.023 — .086 008 —.178 021 .150 —.072 —.173 —.417 
2 1.190 —.121 —.089 —.014 —.042 —.016 —.063 —.016 .079 —.005 —.028 —.114 
3 1.333 —.165 —.010 —.076 —.023 —.106 —.032 .220 039 O11 —.155 
4 1.336 —.041 —.078 —.056 —.099 —.059 .106 —.006 —.033 —.153 
5 2.406 —.399 —.634 —.267 —.655 —.159 —.098 —.038 —.086 
6 2.370 —.632 —.276 —.668 —.004 —.021 .056 — .039 
7 2.999 —.398 —1.053 —.104 —.053 .080 —.001 
8 2.115 —.410 —.040 —.070 —.056 —.156 
9 3.016 —.041 —.011 .146 .055 
10 1.559 —.374 —.464 —.314 
11 1.567 —.331 —.306 
12 1.658 — .459 
1.927 


> D h''m,; — k = .027, and 
| m;; | = .00240. 
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Then, placing these results in (20), we have 
= 144{log .00412 — log .00240 + .027} 
= 81.6, 55 degrees of freedom. 
Using V1x? — V2n — 1 and the normal approximation, we have 
t = 2.33. 


But P(t > 2.33) < .01. Thus it appears that a fourth component really 
should be sought. 

Formula (39) was then applied to test the significance of two of the 
oblique component loadings of this example. The oblique solution is given 
by Holzinger and Harman (7, 250, 251) and is reproduced in Table 3. 


TABLE 3 
Oblique Solution for Thirteen Tests* 








Component Loadings 








Variable Adjusted Uniqueness 
1 2 3 
1 731 — .089 . 142 .432 
2 441 .004 .004 .802 
3 721 — .090 — .142 .619 
4 .508 .090 — .003 . 682 
5 — .058 .801 .087 343 
6 .037 . 809 — .051 347 
if — .068 .901 — .030 .279 
8 155 .591 .078 .460 
9 — .068 .919 — .081 . 282 
10 — .385 . 164 . 809 401 
11 — .039 .077 659 .539 
12 .073 —.177 .773 456 
13 351 — .061 594 392 





Intercorrelation of Components (rp; -;): 











F F, Fs 
F 1.000 . 587 .449 
F, 1.000 .461 
Fs; 1.000 





*Reproduced from Holzinger and Harman (7, 250, 251). 


The adjusted uniquenesses are such that the diagonal entries in the reproduced 
covariance matrix are all unity. These are used in the application of the 
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Dwyer-Guttman technique used in calculating the inverse and determinant 
of our basic covariance matrix. Actual calculation of these quantities yields 





| ws; | = .00425 

and (u"’) is: 

Vari- 

able 1 2 3 4 5 6 7 8 9 a» = 2 
1 1.697 —.195 —.363 —.266 —.025 —.090 .007 —.168 .018 .165 —.076 —.181 —.432 
2 1.184 —.122 —.087 —.016 —.043 —.016 —.058 —.015 .084 —.006 --.030 —.115 
3 1.372 —.168 —.014 —.082 —.025 —.102 —.031 .243 — .037 .008 —.164 
4 1.345 —.045 —.084 —.055 —.093 —.055 .114 —.008 —.035 —.156 
5 2.445 —.452 —.622 —.272 —.617 —.178 —.101 —.031 —.086 
6 2.410 —.623 —.282 —.630 —.008 —.023 .064 —.039 
7 2.731 —.353 —.861 —.104 —.049 .082 001 
8 1.978 —.349 —.041 —.066 —.049 .145 
9 2.673 —.043 —.011 .135 049 
10 1.685 —.407 —.507 —.341 
11 1.607 —.334 —.309 
12 1.708 —.469 
13 1.972 


The common components loadings a,2,. and a,3,, of the above factor 
pattern were tested for their significance. The following is the summary of 
the computation following the notation of Section 4. 














Q2,2 43,1 
7 | 406 bu 743 
bos . 265 bo | .445 
bs2 . 268 | bs | . 604 
bye | .387 ba | .559 
bs: | 807 bs | 451 
he 807 a 489 
hes | (947 | in 447 
bso | 718 | ba | 537 | 
™ | 842 | bs Ck 435 | 
bio,2 311 bio. 075 | 
hiss > 358 bis 302 
a | baa | .316 | 
b, 419 ; ree | 
eres See) eee ere ae ee 
} —.392 | Vo| —.940 | 
r | 1.57 | » | 22.0 | 


From the chi-square tables for 12 degrees of freedom, we find that a,2,2 is 
not significantly different from zero, and 4,;,, is significant at approximately 
the 2% level. 
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APPROXIMATING MAXIMUM TEST VALIDITY 
BY A NON-PARAMETRIC METHOD 


HARoLD WEBSTER* 
UNIVERSITY OF KENTUCKY 


The Gleser-DuBois conditions for selecting from a number of test items 
those which will maximize the correlation between total test score and criterion 
will degenerate into expressions requiring only item counts on total distribu- 
tions and the upper halves of distributions. A grouping convention for scores 
near medians is recommended. The inefficiency of the method is easily com- 
pensated for, because, regardless of the size of the sample, only standard test- 
scoring equipment and_ brief computations are required. A procedure is 
outlined, and some applications are discussed. 


1. Introduction. Gleser and DuBois{ discuss methods for determining 
which items of an experimental test to retain in order to approximate maxi- 
mum test-criterion validity. They describe a new method which has the de- 
sirable feature of allowing for the changes in item-test correlations which 
occur after a first selection of items, so that additional items may then be 
added or dropped in order to achieve still higher validity. The method as 
described by the authors requires item-test and item-criterion point-biserial 
correlations at each cycle, in addition to a product-moment correlation 
between test and criterion. The purpose of this paper is to describe a less 
laborious, though analogous, procedure with which dependable results may 
be obtained for dichotomized items when N is large. 

2. Derivation of the conditions for item selection. The Gleser-DuBois 
condition to be met by item 7 in order to be retained in the first sub-test is 
that its ratio of point-biserials 


tilt > Tee for i. 0 


” (OF 





CishPix < Tie for r,, < 0, (1) 


t and c referring to test and criterion, respectively. By reference to correlation 
formulas, { (1) may be seen to be equivalent to 
(X,. — X)8, 
(Xi. — X,)S, 

*Now at Vassar College, Poughkeepsie, New York, 

+Gleser, G. C , and DuBois, P. H. A successive approximation method of maximizing 
test validity. Psychometrika, 1951, 16, 129-139. 

tGuilford, J. P. Fundamental statistics in psychology and edueation (2nd Ed.). New 
York: McGraw-Hill, 1950. 

§The convention used here and in subsequent expressions is that ‘‘positive 
after >, “negative” after <. 


27, , for (X,, — X,) positive or negative, respectively,§ (2) 


” 


is read 
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where X;, , X;, are means on criterion and test for those individuals giving 
one kind of (correct or preferred) response to item j, and X, , X,, S, , S; are 
the means and standard deviations for criterion and test. 

If both criterion and test distributions are now cut at their medians, so 
that scores which fall above medians are arbitrarily assigned a value 1 and 
those below are assigned a value 0, both means and both standard deviations 
become 34, and (2) may be rewritten 

“oa Ee for (X;, — 4) positive or negative (3) 
oa A jt 2) positive g , 
where ¢7c¢ is a fourfold point-correlation coefficient. Since both test scores 
and criterion scores have been forced into 50-50 dichotomies, 7,, has become 


:3 XX. — N(2)(3) 4arc 
re FO SSS — 1. 4 
: nvara ON ” 


The substitution es X,X, = Arc has been made in (4) because this value, 
which may be obtained by counting, is merely the number of scores above 
the median on the test which are paired in the summation with a criterion 
score which falls above the criterion median. N is the size of the sample. _ 

Because test and criterion distributions have been dichotomized, X;, 
in (3) may be written a;-/n; , where a;¢ is the number of correct (or preferred) 
responses to item j of individuals falling above the median on the criterion, 
and n; is the total number of such responses for all individuals. Likewise 
X 5, = a;r/n; , the number of correct responses on item j given by those above 
the median test score, divided by the total number of correct responses. 
Making these substitutions, and calling the index for the initial selection of 
items J, , (3) becomes 











_ 2jc — Mi y 447 


a. 1, for (2a;7 — n;) positive or negative. (5) 


I, 

The item selected by (5) form a sub-test 7, . The Gleser-Dubois con- 
dition for further selection, either by adding new items or by dropping items ° 
now in 7; , is 


Tie! (Tit, = i S;/28,,) 2 Vese 9 (rie, = S;/28,,) 


positive or negative, the minus sign to be used only (6) 
for items previously selected by (1). 


In (6), 7 refers to the item, c to the criterion, and the S’s are standard devia- 
tions, as before, but ¢, indicates that the statistic is now computed for the 
sub-test 7, . Substituting expressions for the point-biserials and substituting 
the value Vp;q; for S; , the standard deviation of the item, the left side of 
(6) becomes 
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= (t. a X.)S,, 
6 Oe Oe 4q,) 8, 


where q; is the proportion of incorrect (nonpreferred) responses for item 7. 
For dichotomized test and criterion distributions, the means and standard 
deviations of (7) are changed as before [for (3) and (5)], the only difference 
being that ¢ is now ¢, . These plus the additional substitution in (7) of 
q; = 1 — n,/N result in a condition, analogous to (6), for a second selection of 
items by means of a second item index J, . Item j is selected to be in the 
second sub-test 7’, if it was previously selected by (5) and if 





(7) 





I = 24;¢ emer 4a7,¢ — 
> air, — m4) +2j/N SN 
for positive or negative 2(a;7, — n;) — n;/N, (8a) 


or if it was previously rejected by (5) and if 


am 2a;¢ — Nn; > 4ar.c 
2a;r, — n;/N ~ N 


for positive or negative 2a;7, — n/N. (8b) 





ie a 1, 


Values used in (8a) and (8b) are the same as those in (5), except that a;7, is 
the number of correct responses on item j in the upper half of the 7, distribu- 
tion, and a@7,¢ is now also obtained by using 7’, instead of 7’. After rescoring 
items selected to comprise 7’, , (8a) and (8b) may be applied again for further 
selection if this should be necessary. 

3. Procedure for selecting the items. Items satisfying conditions (5) and 
(8a) or (8b) are most easily selected by the following procedure. 

(1) Obtain the responses for the items from which the final test is to be 
selected on IBM answer sheets. Preferably there should be a large even 
number of subjects. 

(2) Score the answer sheets, recording two scores on each, the total test 
score 7’ and the criterion score C. 

(3) Separate the sheets into two equal piles, those with C score above 
and those with C below the median C. 

(4) Using only the NV /2 sheets with high C scores, first count and record 
a;c , the number of correct or preferred responses present for each item j, 
and then obtain a7c by counting the number of total test scores T which 
fall above median 7’. 

(5) Using only the N/2 sheets with low C scores, obtain the number of 
correct responses for each item 7. Add these to the a;¢ obtained in Step (4) 
to obtain n; , the total number of correct responses for each item j. 
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(6) From the N/2 sheets with 7 scores above median T obtain a;r , 
number of correct responses present for each item 7. 

(7) Obtain the ratio J, for each item (It is usually unnecessary to perform 
the division), apply condition (5), and rescore the sheets with the total scores 
T, obtained by using only the items selected by (5). 

(8) From the N/2 sheets with 7’, scores above median 7, obtain a7,¢ , 
the number with C scores above median C, and a;7, , the number of correct 
responses present for each item /. 

(9) Obtain the ratio 7, for each item, apply condition (8a) or (8b), and, 
if further changes occur, rescore the sheets with T, , the total scores using 
only the items selected by (8a) or (8b). 

(10) Repeat Steps 8 and 9 for T, and subsequent sub-tests if necessary. 

The division of C scores into those above and below the median (Step 
4) presents an additional kind of problem if a number of sheets must be 
chosen from a larger number, all of which have the same score value over- 
lapping the median. For very large N, random selection would probably 
suffice, but for ordinary purposes, biasing the correlations can be avoided 
by selecting those cases needed from the sheets with 7 values nearest median 
T. For example, if 5 out of 12 scores C = 14 were needed in the high C score 
group to bring it up to N/2 in size, then the 5 chosen would be those with T 
scores nearest median 7. An analogous procedure may be used in dividing 
the N sheets into high and low 7’ scores when it is necessary to select some of 
them having identical values: The chosen portion would be those sheets 
having C scores nearest median C. With increasing N and increasing score 
ranges for 7 and C, the chance of bias from a poor assignment decreases. 

4. Some applications. The method which has been described was first 
applied to two experimental tests, one (V = 40) consisting of 8 items with 
small or moderate inter-item and item-criterion correlations, the other 
(N = 100) consisting of 22 items with almost negligible inter-item and item- 
criterion correlations. The product-moment test-criterion correlation (based 
on full scores) for the first test was increased from .63 to .70 by the rejection 
of 3 items, and for the second test it was increased from .03 to .24 by the 
rejection of 14 items. 

The method was next applied to the Gleser-DuBois data* for a 10-item 
test using 20 hypothetical subjects. These authors showed that their method 
selected only the 5 odd-numbered items, which raised the test validity from 
.60 to .97. With the method of the present paper the same items were selected, 
but Item 10 was also incorrectly selected, achieving a validity of .94. A 
second application, which ignored the grouping convention described in Part 
3 above, selected Item 10 and also incorrectly rejected Item 1, resulting in a 
decrease in validity from .97 to .95. Considering the small size of the sample, 


*Op. cit., Table 1, p. 138. 
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these failures of the non-parametric method to obtain maximum validity 
are not surprising. 

There is no reason to suppose that item-selection methods based upon 
regression of criteria on item composites can lessen the need for cross-valida- 
tion. On the contrary, such methods utilize inter-item covariation and are 
therefore at least as sensitive to sample peculiarities as less precise methods. 
In order to study the cross-validation problem further, the method was 
applied to a 20-item test which had been administered to 500 subjects for 
whom a continuous criterion was also available. The latter consisted of a 
masculinity-femininity scale with a range 0-9. The 500 cards containing the 
data were first divided into random halves, A and B, and the item selection 
was then carried out using only the 250 cases of Sample A. Validity co- 
efficients for Sample B, based on the same items before and after selection, 
were then also computed and may be compared in Table 1 with those of 
Sample A. As expected, the correlation for the shortened test on the cross- 
validation sample (.471) is less than that for the original sample (.495), 
although the difference is not significant. 


TABLE 1 


Cross-Validation Data for Maximizing Test Validity by Item Selection* 




















Before Selection (20 Items) After Selection (10 Items) 
Sample N Xt or ee Cc T te Re ot Xs oe l te 
Original 
(A) 250 9.57 2.81 4.47 2.02 .429 4.96 1.98 4.47 2.02 .495 
Cross-Validation 
(B) 250 9.70 2.83 4.52 1.94 .4381 5.138 2.00 4.52 1.94 .471 








*Subscripts ¢ and ¢ stand for total test score and criterion score, respectively. 


In none of the cases discussed above was there any change in the items 
selected due to applications of (8a) and (8b), which suggests that these 
conditions are of limited practical value. Condition (6), when applied by 
Gleser and DuBois to their hypothetical 10-item test, resulted in no changes 
beyond those effected by (1). These authors cite a case (p. 137), however, in 
which the validity of a longer test was increased slightly by using (6). 

In summary, condition (5) appears to provide a practical criterion for 
selecting those items which will maximize total test validity. Perhaps appli- 
cations to tests comprised of more items than those discussed in this paper 
would be helpful in determining the usefulness of (8a) and (8b). 
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A NOTE ON THE NEYMAN-JOHNSON TECHNIQUE* 


RosBert P. ABELSON 
YALE UNIVERSITY 


A statistical problem which frequently arises in educational and 
psychological experimentation is that of testing the significance of the - 
difference of the mean scores of two groups on some criterion variable, 
where the differential effects of one or more variables which are correlated 
with the criterion must be statistically eliminated. The usual analytical 
technique for this type of problem is the analysis of covariance (9). The 
Neyman-Johnson technique (7) provides another, and substantially different, 
approach. A computational procedure is suggested here which utilizes the 
advantages of both techniques without an undue increase in computational 
labor. In addition, the Neyman-Johnson technique is generalized to the 
case of n predictor variables. Its application has heretofore been limited to a 
maximum of three predictor variables. 


Comparison of the Analysis of Covariance and the Neyman-J ohnson Technique 


Consider two groups of individuals, designated G and H. Suppose 
that measures are available for all individuals in both groups on a criterion 
variable, y, and on r control variables, x, , x. , --- 2, . The linear regressions 
of y on the x for the separate groups may be written 


Ya = boa + Diet: + begte +--+ + 0,62, , (1) 
Qu = bow + Bint, + Donte + +++ + Ont, . (2) 


An analysis of covariance properly requires (1) that three hypotheses be 
tested: 

A. The variances of the observed y’s about the regression surface are 
equal for the groups G and H. 

B. The corresponding regression slopes are identical for the two groups. 


(big = bin 3 bog = bey 3 +++ 5 Org = Oyu). 
C. The intercepts are equal for the two groups. 
(bog = box). 


If either of the first two hypotheses is untenable, then any conclusions 
which might be drawn from a test of Hypothesis C are to some extent vitiated. 
If Hypothesis A is untenable, then theoretically the test of Hypothesis C 
is illegitimate (although, practically speaking, the investigator may wish to 

*This paper was written while the author was a Psychometric Fellow of the Educational 
Testing Service, Princeton, New Jersey. 
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go ahead anyway). If Hypothesis B is rejected, the investigator may give 
up the ghost, go ahead anyway, or, as is recommended here, apply the Ney- 
man-Johnson technique. For when the regression slopes are different, it is 
clear from eqs. (1) and (2) that the difference (G§¢ — 9) and, in particular, the 
significance of the difference (J — Jz) is a function of the control variables, z. 
In other words, certain specified segments of the populations giving rise to 
groups G and H may differ significantly on the criterion variable, whereas 
other segments do not. The segments are specified by locating certain sets of 
values of 2, , X , +: x,—certain regions of the “z-space.”’ The “region of 
significance’’ (5) of the Neyman-Johnson technique is defined as the set of 
points in the x-space where one group is significantly better than the other on 
the criterion variable. 

In contrast with the analysis of covariance procedure, which tests 
Hypotheses A, B, and C in that order and never investigates the region of 
significance, Johnson (5) suggests, essentially, that Hypothesis C be tested 
first. He gives no test for Hypothesis A, although he has stated that its 
acceptance is necessary (6). If Hypothesis C is rejected (i.e., if the groups 
are significantly different on y), then the region of significance is always 
calculated. What frequently happens is that the region of significance turns 
out to be so large that it includes almost all values of the x’s which could 
ever reasonably be observed. (Such a region will usually be elliptical.) Ex- 
amples of this are in the literature (2, 3, 4, 5, 6). The calculation of such a 
region of significance adds no information to the analysis, and is performed 
at the expense of a good deal of computational labor. This waste can be 
avoided by the simple expedient of avoiding the Neyman-Johnson technique 
whenever it is found that there is no significant difference between the re- 
gression slopes (Hypothesis B is accepted). On the other hand, the technique 
can be quite useful when the regression slopes are significantly different 
(Hypothesis B is rejected). The compromise procedure, then, is to test 
Hypotheses A and B in that order. If B is accepted, test Hypothesis C. If 
B is rejected, use the Neyman-Johnson technique if it is felt that the region 
of significance will be of interest. Computational procedures for the tests of 
Hypotheses A, B, and C are given in (1). All the quantities needed to com- 
pute the region of significance except Pz’ and P;’ (see below) will already 
be known if it is decided to perform this computation. 


Computation of the Region of Significance for Any Number of Predictors 
Using maximum-likelihood methods, the Neyman-Johnson technique 


sets up the ratio (the notation is the author’s) 


a ae 
(Un) 
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where D is a vector each of whose entries represents the difference between 
corresponding regression slopes for the two groups. 

D = [(box — boa), (din — bie), +--+ , (0-2 — b,e)]. (3) 
X is a vector which specifies a set of fixed values of the control variables. 
X =m [xo % Le *** Lr], (4) 
where 2, is defined as unity. 
U is a matrix found from 
U = (Po + Pr), (5) 


where ¢ is the error variance of estimate for the total population including 
both groups G and H. (The acceptance of Hypothesis A makes it legitimate 
to use a weighted average of the error variances for groups G and H as an 
estimate of t.) 

Pg is the following matrix of data obtained from group G: 


Ne D> tr 7, His er 2, fe | 
a a a 
> Lia Zs re > Viat2a pains p» Viatra } 
a a a a 
Po = be U2 0 Li C2a%1 « + ® tse seas bie V2atra 
a a a a 











ps Ura 2. Lratia ps re mre z. Lo 
a @ 5 - 


(ng is the number of individuals in group G and a is a subscript denoting 
the individual.) Py is the corresponding matrix for group H. 
The groups may be considered significantly different on the criterion 
for all X-vectors such that 
an (1 — L,) , 
lee | >= (ne + nu) a ms (6) 


where L is distributed according to a Beta distribution with 


[net ne ao Ot »| 


and 3 degrees of freedom, and L, is the value of LZ corresponding to the 
y% tail of the curve. The above is a concise statement of the Neyman- 


ass 


afi 4 8 Base 
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Johnson criterion of significance as it is given for two predictor variables 
(5) and for three predictor variables (6). But, in fact, the statement holds 
for any number of predictors, as we will now show by using a different line 
of approach from that of Neyman and Johnson. 

Consider the variable 


Ga nae Gu - 


O(tg-tn) 


This is the critical ratio of the difference between the predicted criterion 
scores of the two groups. It is a function of the vector X, and is of course a 
random variable, since it depends on the regression slopes which are subject 
to sampling fluctuation. 

If we denote the vector [bog , big , «5 be] by Be and take into account 
the fact that the b’s are distributed over an (r + 1)-variate normal surface 
(10) with covariance matrix tP;', it is apparent that B,X’ is distributed 
normally with variance X(tPz')X’. Thus DX’ = (Bg — By)X’ is distributed 
normally with variance X(tPo' + tPz')X’ = XUX’. Since by definitions 
(1), (2), (3), and (4), Ga or un = DX’, then 





9a ae Gu hae DX’ : 
T(ig-in) - (XUX’)'” (7) 


Under the null hypothesis that the true value of %@ is equal to the 
true value of 7, , the ratio (7) is normally distributed with zero mean and 
unit standard deviation. This assumes that the number of observations is 
large enough so that ¢ is estimated without error. (X is considered fixed.) 
A significant difference on the criterion is found for all X such that 


Dx’ . 
laine | a (8) 


where @ is normally distributed with zero mean and unit standard deviation, 
and 6, is the value of 6 corresponding to the y% tail of the curve. (8) is 
approximately equivalent to (6), as can be shown in the following way: 
L is distributed as the Beta distribution 


Ce a re see A (9) 


N—4 ) 
a : “SE 
where N = ng + ny = the total number of observations. 
Set up the new variable 





a Meee 
F=N 7 (10) 
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Then 
N 
L=FIN’ (11) 
F 
oie eae ca 
dL N 
aF = (F 4 + Ny? and (13) 
(F) dF l ( Py pt? aR (14) 
ies, ae mara OO 
7 N 
ey 


Compare this with Snedecor’s F distribution with 1 and N — 4 degrees 
of freedom: 


1 1 ( F (3-N)/2 , 

————— |] + 7 F'’ dF. (15 
= > N-4 (15) 
a7 *s 





p(F) dF = 





For N of the order of 200, (14) is an excellent approximation to (15). But 
an F distribution with 1 and N — 4 degrees of freedom is equivalent to the 
distribution of the square of ‘“‘Student’s” ¢ with N — 4 degrees of freedom. 

For large N, ¢ in turn is very nearly unit normally distributed, and thus: 


(i — 
N L 


normal deviate. 


is approximately distributed as the square of a unit 


This establishes the approximate equivalence of (6) and (8). 
Now the equation for the region of significance is found from (8), which 
can be rewritten: 


X D’DX’' : 
XUX’ Ze. (16) 


Multiplying both sides of the inequality by XUX’ (which is always positive 
since U is positive definite), we have 


XD'DX' > X6;UX’, (17) 
X(D'D — 6;U)X’ > 0. (18) 

Setting 
A = D'D — 67U, (19) 
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we find that the sets of values for which the groups are significantly different 
on the criterion are given by 


XAX’> 0. (20) 


Where XAX’ < 0, there is no significant difference between the groups. 
The boundary of the region of significance, XA X’ = 0, is a quadratic surface 
in r dimensions, where r is the number of predictor variables. A is computed 
from (3), (5), and (19), where 6°; (say) = 3.8416. It will usually be quite 
a task to plot the actual boundary, especially if r > 2, but analytic-geometrical 
methods are available (8) for this process. 
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This study consists of four factor analyses of the Army Air Forces 
Aircrew Classification Batteries. The first was an analysis of the 1945 wartime 
battery, while the other three were analyses of the 1947 postwar battery, 
consisting of essentially the same variables, but using different samples. 
Eleven factors were found which had been identified and reported in previous 
analyses. An additional factor, possibly an artifact, was identified as an 
age-education doublet. The only factor which differed significantly in the 
analyses was pilot or flying interest. These factor analyses show that the 
factorial content of the tests remains quite similar in both wartime and 
postwar populations. 


Introduction 


During World War II, trained psychologists developed and administered 
the Army Air Forces Aircrew Classification Battery. An account of this work 
can be found in the Army Air Forces Aviation Psychology Program Research 
Reports (3, 4). A report of the validity of the battery under peacetime 
conditions and the results of postwar research on the improvement and re- 
vision of the battery are found in (1). The purpose of this paper is to compare 
the factorial content of the battery on postwar samples with wartime samples 
and to draw inferences concerning the stability of the factorial pattern. 


Analysis of the Data 


Four separate analyses were made. The first was on the 23-variable 
June 1945 Aircrew Classification Battery. This wartime sample consisted of 
8574 unclassified aviation trainees tested at Keesler Field during the summer 
of 1945. The other three were on the February 1947 Aircrew Classification 
Battery, which was administered to 1511 basic pilot trainees between Feb- 
ruary, 1948, and April, 1949. The trainees were divided into two groups: 
1000 trainees with no previous flying experience and 511 with previous 

*The data reported in this study were collected as part of the United States Air 
Force Human Resources Research and Development Program and described in Research 
Bulletin 52-16. The opinions or conclusions contained in this report are those of the authors. 


They are not to be construed as reflecting the view or indorsement of the Department of 
the Air Force. 
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flying experience, and a separate analysis was made for each. The third 
postwar analysis used the combined group with previous flying experience 
included as an additional variable. A more detailed description of the samples 
is given in (6), and descriptions of the test variables are given in (3), (4), 
and (6). 

Factors were extracted from these matrices by Thurstone’s centroid 
method (5). In all cases, more factors were extracted than were hypothesized 
to be in the matrix. The criteria used to determine when to cease extracting 
factors were numerous, and the most liberal were usually used. The centroid 
loadings were rotated to psychologically meaningful factors with attempts to 
get positive manifold and simple structure. The Zimmerman orthogonal 
graphical method (7) was used for rotations. The centroid and rotated 
loadings are reported in their entirety in (6). Table 1 gives a comparison of 
communalities of the variables in the four analyses. 


Interpretation of Results 


The significant loadings (.30 or greater) of the variables on each factor 
are given below. The >, a’/k is given by factors to show the average variance 
contributed by each factor for all the variables in the battery, and permits 
a comparison of the variance extracted on each factor in the different analyses. 
The factors are the same as those identified and described by French (2) and 
Guilford (3). The factor loadings for 12 interpreted factors are presented 
in Table 2. 

The identification of the Socio-Economic Background factor differs 
somewhat from its interpretation in wartime research, in which it was termed 
Mathematical Background (3). Attention is also called to the fact that in 
the case of the factor identified as Pilot or Flying Interest the >> a’/k is 
larger for the total postwar study, since it included the previous flying ex- 
perience variable. The factor termed Age-Education Doublet is presumably 
an artifact. The minimum age in the samples being 20 years, it is hypothe- 
sized that the men with more education were in general older: hence the 
positive relation between age and education. 


Summary and Conclusions 


1. These factor analyses of two aircrew classification batteries show that 
the factorial content on postwar populations remains quite similar to that 
found on the wartime population. The stability of the factor pattern is 
evident from the factor loadings reported. The stability of the communalities 
is also shown. 

2. The only factor which had any significant difference in the analyses 
was one identified as Pilot or Flying Interest. This factor is identified by 
the strong relationship with the variable of previous flying experience. 
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TABLE 1 
Comparison of Communalities of the Four Analyses* 








Keesler Postwar Sample 
Variable Sample 





Total NFEt PFE{ 





Age — 23 19 20 
Education — 36 38 31 
Arithmetic Reasoning, CI206C 57 56 62 57 
Biographical Data (Bomb. O.), CE602D 25 40 41 38 
Biographical Data (Pilot), CE602D 57 48 49 40 
Coordinate Reading, CP224B 55 54 56 58 
Dial and Table Reading, CP621-622A 67 65 65 73 
General Information, CE505F 55 73 56 64 
Instrument Comprehension, CI616C 45 48 44 43 
Mechanical Information, CI905B 60 60 59 69 
Mechanical Principles, CI903B 65 64 64 65 
Numerical Operations I (Front), CI702B 68 65 66 63 
Numerical Operations II (Back), CI702B 73 67 67 7 

Practical Judgment, CI301C 34 34 29 40 
Reading Comprehension, CI614H 59 49 56 5 

Spatial Orientation I, CP501B 54 53 55 53 
Spatial Orientation II, CP503B td 40 43 37 
Speed of Identification, CP610A 44 46 44 49 
Complex Coordination, CM701E 55 55 55 62 
Discrimination Reaction Time, CP611D 44 32 38 34 
Finger Dexterity, CM116A 29 32 32 30 
Rotary Pursuit, CP410B 43 39 41 41 
Rudder Control, CM120C 49 76 42 30 
Two-Hand Coordination, CM101B — 50 53 52 
Previous Flying Experience — 66 —_ — 
Pedestal Sight Manipulation, CM724A 18 — — co 
Two-Hand Pursuit, CM810A 40 —_ — — 





*Decimal points omitted. 
t{NFE—No Flying Experience. 
t~PFE—Previous Flying Experience. 
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TABLE 2 


Rotated Factor Loadings for Twelve Factors Based on Four Samples 














Keesler Postwar Samples 
Factor Sample 
Total NFE* PFET 
Mechanical Experience 
Tests: Mechanical Information, CI905B 2 .68 as a 
General Information, CE505F -47 .53 .60 .49 
Mechanical Principles, CI903B .52 .48 .48 .49 
Biographical Data (Pilot), CE602D .45 .51 .50 .40 
2a?/k .07 .06 .07 .06 
Psychomotor Coordination 
Tests: Complex Coordination, CM701E .49 41 .52 .52 
Rotary Pursuit, CP410B .52 .33 .42 385 
Rudder Control, CM120C .55 .44 .42 31 
Two-Hand Coordination, CM101B — 41 51 44 
Two-Hand Pursuit, CM801A .40 a=: —— a 
Pedestal Sight Manipulation, CM724A .34 — — — 
2a?/k .06 .03 05 .04 
Perceptual Speed 
Tests: Spatial Orientation I, CP501B 65 64 64 . 66 
Speed of Identification, CP610A .58 .63 .61 .56 
Spatial Orientation II, CP503B .49 .48 .48 .42 
Coordinate Reading, CP224B .38 .38 .42 .39 
Dial and Table Reading, CP621-622A .28 .3l .35 .o2 
La?/k .07 .06 .06 .06 
Socio-Economic Background 
Tests: Biographical Data (Bomb. O.), CE602D .38 .60 .60 .56 
Education — .44 .46 .45 
Biographical Data (Pilot), CE602D .53 .28 .36 31 
2La?/k .02 .03 .03 .03 
Numerical Facility 
Tests: Numerical Operations I (Front), CI702B .76 7 .74 .69 
Numerical Operations II (Back), CI702B a .78 .76 i. 
Dial and Table Reading, CP621-622A .50 47 .48 .43 
Arithmetic Reasoning, CI206C .38 .38 .38 35 
Coordinate Reading, CP224B 32 .29 .35 .33 
La?/k .08 .08 .07 OT 
General Reasoning 
Tests: Arithmetic Reasoning, CI206C .38 .49 .49 .42 
Mechanical Principles, C1903B .38 .34 .40 .30 
La?/k .03 .03 .03 .03 
Psychomotor Precision or Finger Dexterity 
Tests: Finger Dexterity, CM116A .38 51 47 .42 
Rotary Pursuit, CP410B .32 .42 .38 44 
La?/k .02 .03 .03 .03 








*No Flying Experience. 
tPrevious Flying Experience. 
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TABLE 2 (Cont.) 
Rotated Factor Loadings for Twelve Factors Based on Four Samples 








Keesler Postwar Samples 
Factor Sample 





Total NFE PFE 





Pilot or Flying Interest 


Tests: General Information, CE505F .49 57 .32 54 
Rudder Control, CM120C 15 .66 .30 .36 
Previous Flying Experience — a & — — 

2La?/k .02 .06 .02 .03 
Spatial Relations 

Tests: Dial and Table Reading, CP621-622A 42 .39 .36 .49 
Coordinate Reading, CP224B .40 37 44 .38 
Complex Coordination, CM701E .37 41 .39 .38 
Two-Hand Coordination, CM101B — .36 .36 .35 
Instrument Comprehension, CI616C .33 .32 34 .37 
Discrimination Reaction Time, CP611D 37 .32 .36 .36 

La?/k .04 .O4 04 .05 
Verbal Comprehension 

Tests: Reading Comprehension, CI614H .58 .56 .67 .56 
Practical Judgment, CI301C .42 41 .40 .44 
Arithmetic Reasoning, CI206C 41 .33 .40 44 

2La?/k .05 .03 .05 04 
Visualization 

Tests: Mechanical Principles, CI903B .33 .46 .37 .47 

Instrument Comprehension, CI616C 24 .32 .27 .08 
Ya?/k .02 .03 .02 .02 

Age-Education Doublet 
Age .36 31 .33 
Education 21 .28 .20 


Za*/k 01 Ol 01 
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EXPERIMENTAL INDEPENDENCE 
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Tautologies are established for the reliability coefficient pj of the sum 
of n part scores. It is not assumed that the part scores are experimentally 
independent of each other nor that the parts are equivalent to each other. 
The tautologies show the exact role played by experimental dependence 
and nonequivalence of parts, respectively, in the reliability of the sum. The 
formal algebra is appropriate to reliability in the sense of repeated trials 
of the same test, as well as in the sense of a universe of parallel tests, although 
the empirical meanings are different. Emphasis is on practical formulas that 
require information from only a single experiment (or test). These can take 
the form only of lower bounds to p?, four of which are developed. 


1. Introduction 


The posing of the problem of speeded—or noncompleted—tests by 
Gulliksen (2), Cronbach and Warrington (1), and others, emphasizes anew 
the need for scrutinizing the foundations of the various reliability coefficients 
popularly in use. The Spearman-Brown coefficients, the Kuder-Richardson 
formulas, the Guttman lower-bounds, and all related coefficients which 
refer to the reliability of a sum and which are based on but a single trial, 
have an assumption in common. They all either implicitly or explicitly 
assume that the scores being summed are experimentally independent. 

For noncompleted tests, it is apparent that such an assumption of 
independence is incorrect. If each respondent is scored zero on each item 
that he does not attempt, then there is, in general, a serial experimental 
dependence among the later test items—especially in time-limit tests. 

Even in power tests, where all items are attempted by everyone, ex- 
perimental dependence may hold between items, as when answering a given 
item correctly depends on how one answers the preceding items. Many other 
examples can be suggested where the assumption of independence of experi- 
mental errors is not appropriate. 

The purpose of the present paper is to present some general reliability 
formulas that make no assumptions at all about experimental independence. 
These formulas, then, apply to the reliability of the sum of any item scores, 
whether independence holds or not. 

The basic and exact formulas developed here are mathematical identities 
or tautologies. As such, they have no immediate practical use. Their importance 
lies in the fact that they provide a universal framework from which different 
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practical formulas are easily derived. For a formula to be practical, it has to 
make some assumption as to the nature of the experimental dependence 
among items. Our tautologies show the exact mathematical nature, location, 
and role of such assumptions. 

The tautologies can be modified to yield practical formulas, especially 
lower bounds to p; . One example of an applied formula which ensues is for 
speeded or noncompleted tests, based on an exact statement of the serial 
dependence involved. Another practical example is that of the limiting case 
of experimental independence; the resulting formulas coincide with certain 
previously known ones, as should be expected. The specification of any 
other type of experimental dependence leads just as immediately to the 
appropriately modified and practical formulas from our general ones. 

Both the theoretical and practical types of formulas of the present 
paper avoid also any assumptions concerning the mutual equivalence of 
the items within the test. Instead, they show the exact role of the different 
aspects of nonequivalence, and how no knowledge of the nature of the non- 
equivalence is needed in practice. 


2. The Respective Frameworks of Retests and of Parallel Tests 


We are concerned with the reliability of the sum of n part scores, where 
n = 2. Often a test will be composed of n items, and the part scores are 
simply the scores on the respective items. In other cases, the test may be 
composed of m > n items, and each part score may be based on more than 
one item; for example, the test may be split into two parts and thus have 
n = 2, although each part may include dozens of items. 

We assume nothing about “equivalence” of parts of the test in any 
sense whatsoever. For example, a test of 100 dichotomous items may be 
split to have n = 2, with one part score based on but one item and the other 
part score based on 99 items.* 

The notion of reliability with which we are concerned relates to a uni- 
verse—usually hypothetical—of repeated experiments with the same test 
on the same population of respondents. 

Each part score has its own reliability, which is separately defined. 
To study the reliability of part scores separately—especially when each is 
based on but a single item—can be done in practice in either of two ways: 
(a) actual repeated experiments, provided there is no effect of one trial on 
the next; or (b) correlation on a single trial with an outside variable—or set 
of variables—experimentally independent of the part score. The latter 
technique yields a lower bound to the retest reliability coefficient of the 
part score [cf. (3), especially pp. 277-279]. 

*It may be further commented that some items may be included in both parts, 


as far as our general formulas below are concerned: the resulting experimental dependence 
is but a special case subsumed under the general formulas. 
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It is in the test as a whole, or the sum of n = 2 part scores, that our 
immediate interest lies, and not in the part scores themselves. To study the 
reliability of a sum, the two techniques of the preceding paragraph are again 
available, but now also a third technique is possible, based on a single trial 
of the test alone. Since at least two part scores are available, internal cor- 
relations can be studied on but‘a single trial, and lower bounds can be estab- 
lished in practice to the theoretical retest coefficient from an infinite number 
of repeated experiments of the test. The internal correlations give some 
evidence as to the reliability of each of the part scores separately, and our 
analysis capitalizes on this information to learn about the reliability of the 
sum. 

The kind of reliability coefficient we have implied thus far here has 
been called the retest coefficient, referring to repeated experiments on the 
same test. Our algebra and formulas can be seen to hold—with modification 
only of interpretation—if a universe of parallel tests is thought of, instead 
of a universe of experiments with the same test. As long as the universe 
remains hypothetical—whether of experiments or of parallel tests—only the 
same numerical lower bounds can be derived in practice. 

The difference between retest theory and parallelism theory occurs 
when two or more experiments are actually made on the same test, and two 
or more parallel tests are actually constructed and used [cf. (4)]. Since the 
present paper is aimed at practical formulas based on only a single trial, its 
results hold equally well for parallelism theory where only a single test is 
used in practice. The difference is in interpretation and not in the formal 
algebra of the present paper. For convenience, we shall use the terminology 
of retest theory here, rather than of parallelism theory, although the ter- 
minologies are interchangeable within the limits of the present algebra. 

It is important to distinguish parallelism between tests from parallelism 
within tests. Parallelism within a test refers to equivalence of the part scores 
of a test to each other. This is assumed most fully, for example, in the Spear- 
man-Brown “prophecy” formula, and to a considerable extent in the Kuder- 
Richardson formulas. As already emphasized above, equivalence of part 
scores is in no way assumed in the present paper. Only parallelism between 
tests or experiments is of relevance to our formulas, and not within tests. 
And since we shall deal in practice with only a single trial or test, leaving 
the rest of the universe hypothetical, no problem of actually studying paral- 
lelism exists as far as the practical formulas of the present paper are concerned. 


3. Definitions and Notation 


We need a triple-subscript notation, one subscript for the respondent, 
one for the part score, and one for the trial (or parallel test). Let x;;, be the 
score of the zth respondent on the jth part score on the kth trial. 

The population of respondents will be considered indefinitely large, so 
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that 7 has an unlimited range. Actually, our basic formulas hold for a finite 
population, but observability of the needed parameters from but a single 
trial does require an infinite population (3); otherwise, sampling error due to 
the trial arises, and we do not wish to discuss such sampling error in the 
present paper. For the same kind of reasons, the (hypothetical) universe of 
experiments (parallel tests) will also be assumed to be indefinitely large, 
and the subscript k is also unbounded. 

The total number of part scores for the test is some finite number n, 
where n 2 2, so the range for7j isj7 = 1, 2, ++: , n. 

The total score on the test for person 7 on trial & will be denoted by 
t;, , and by definition: 


ti. = i Viik « (1) 
7=1 


The mean or “expected” value for person 7 on the jth part score over 
all trials will be denoted by X,; , or 


X 5; = E xij ° (2) 
k 


The symbol £ as usual denotes here the mathematical expectation (arith- 
metical mean) over the indicated subscript. The expected total score for person 
j will be denoted by 7; and 


qT; — E tix ° (3) 
k 


The X,; and 7’; can be thought of as what are conventionally called “true”’ 
scores on the parts and total test, respectively. Taking expectations over 
both members of (1), and using (2) and (3), we derive the identity 


T; = > as (4) 


The error of unreliability in the kth trial for person 7 on part score 7 
is defined to be the difference between observed and expected values, 
Xiizx — X,; . The variance of these errors for the ith person will be denoted 
by o:,, , or 


td = E (tin — Xi)’. (5) 
k 
Correspondingly, the variance of the unreliability errors of person 7 on the 
total test is defined to be 
i, =E (tin -_ ro" (6) 
k 


The reliability coefficient itself depends not only on the variations 
within people as defined by (6), but also on the variations among people. 
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To study the latter, we need expectations also over the subscript 7. Let é; 
denote the general mean of the jth part score over all persons as well as 
trials, 


& = E X;; = EE 2; , (7) 
i ik 


and let 7 denote the corresponding general mean for total test scores, 


tT = E T; = EE tix . (8) 
i ik 
Using (1), (8), and (7), we obtain the identity: 


lig > fj. (9) 
The general variance, over persons and trials, for the jth part score will 
be denoted by o:, , and 


e., = EE (Xi ix => é,)”. (10) 


Similarly, the general variance for the total test scores will be denoted by 
2 
o, and 


a; = EE (ten 7 7)’. (11) 
tk 


The reliability coefficient of the total scores depends on how large the 
error variances (6) are on the average compared to o; . The mean error 
variance within people will be denoted by ¢’, where 


fas E ov, . (12) 


The reliability coefficient itself is defined as 
ep=1-5- (13) 
CO; 

We shall see that p? is bounded between zero and unity. If ¢ = 0, then 
there is no variation at all in the ¢;, from trial to trial, and p; = 1, or the 
test is perfectly reliable for the given population. The maximum that ¢’ can 
attain is 0; , as will be shown, in which case p; = 0, or the test is said to have 
no reliability for the population; the variation within people equals the total 
variation. Intermediate values of p; indicate intermediate ratios of within- 
persons variance to the general variance. 

It should be noted that no empirical assumptions of any kind have 
been made in arriving at definition (13).* Equations (1) through (13) are 


*Apart from the tacit assumptions that the expected values over k and 1, respec- 
tively, exist, in the sense of convergence in the limit when k and 7 are unbounded in range. 
Such convergence always is assured if the total scores are limited in magnitude to a finite 
range, as they invariably are for psychological tests. 
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all either definitions or derived mathematical identities, and lay no restric- 
tions on the test, people, or trials. 

The practical problem is to obtain empirical information about ’, 
especially from but a single trial. It has been shown elsewhere (3) how o% 
can be computed exactly from a single trial.f It has also been shown how 
bounds to ¢’ and p; can be set from but a single trial, using the special assump- 
tion of experimental independence among part scores. We now wish to study 
é and p; , but without necessarily assuming experimental independence of 
the parts. 


4. Further Parameters 


The covariance, over experiments, of the errors of unreliability between 
the jth and the gth part score for individual 7 will be denoted by 


Yererss = E (in — Xi) Oise — Xui). (14) 


If the two part scores are experimentally independent for the 7th person, 
then the covariance y,,,,,, vanishes (3, 263-266). If the part scores are de- 
pendent for person 7, then y,,,.,;,; in general differs from zero. Different laws 
of dependence yield different patterns of values for the n(n — 1)/2 different 
covariances defined by (14). Practical use of our formulas below will depend 
on specifying the particular pattern for the particular data. 

Ultimately, we shall not need the law of (14) for each person, but only 
the quantity we shall call 6, where 

Lo (15) 
1~Qg 1 

6 is defined by first averaging each covariance over all people, and then by 
summing over all covariances—omitting self-covariances or variances proper. 
In the case of experimental independence it must be that 6 = 0. The size 
and sign of 6 depend on the nature of the experimental dependence of all 
the part scores. 

As usual, the covariance of a variable with itself is its variance, or 
(14) yields o¢,, when g = 7: 


Passes Wary * (16) 


We also need parameters of the reliable or “true” parts of the obser- 
vations. 
The variance of the 7; over people will be denoted by 


or = E(T; — 7)’. (17) 
+The only assumption used for this purpose is that denoted there as C,, to the effect 


that there is no copying of one person from another nor any other form of experimental 
dependence between people (3, 266). 
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The corresponding variances of the X;; are ox, , where 


ot, = E(X.; — &)". (18) 


The covariance between X;, and X;; over people is 
oo. <a E (X;, feo: E)(X;; Tas £;). (19) 
Again, (19) reduces to (18) for g = 7. 


Finally, we need the general covariance, over people and trials, between 
each pair of part scores: 


Veqr3 = EE (Xigk sa £)(Xiin rae f;). (20) 
ik 


Formula (20) reduces to (10) for g = 7. 
If all the part scores were equivalent, then the oz, would be mutually 
equal for all 7. Let a” denote the variance over items of the a, : 


n 2 


a o:, Ze o:, 

2 j=1 =1 
2 st_ _ | eit _ 21 
. n n a 
Then equivalence would imply a’ = 0. We shall not assume this. Indeed 
in practice, it is generally true that a” > 0. It turns out that practical for- 
mulas for p; , in the form of lower bounds, are obtainable with no knowledge 

or assumptions at all about a’. 
It is useful to define two further parameters, [, and [, , where 


n n 


lr, = a Wiis, = > Vege z. ox; (22) 
1 i=1 


947 e=i i= 
and 
T, = D0 xx; = Vxex1 — Dox; - (23) 
947 g=1 j=1 i=1 
Thus, I, is twice the sum of the n(n — 1)/2 different covariances defined 
by (19), and I, is twice the sum of their squares. 
If the part scores were all mutually equivalent, then all n(n — 1)/2 


covariances defined by (19) would be equal. Let 6° denote the variance of 
the covariances. Then it is easily seen from (23) and (22) that 


oe ae ee Te 
* n(n — 1) Fe — 5 | ; (24) 
Equivalence of part scores would imply 6° = 0. We shall not assume this, 
for in practice 8” > 0 in general. Unlike a’, it is possible often in practice to 
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estimate 8”. For example, in the case of experimental independence, I’, and 
T, can be computed exactly from but a single trial, leading to the lower 
bound A, (3). However, the computations may often be cumbersome. We 
shall arrive at a useful formula which requires no knowledge of 6’. 

A final parameter that we shall need will be denoted by ¢’. If we let 
px,x, denote the correlation coefficient over people of the X;, and X;;, that is: 


Px,x3 7 Weeks ’ (25) 
Ox,7x; 


then we define ¢’ to be the double sum: 


g= > > (1 — px,x,)ox,0%; - (26) 
Since the parentheses on the right are always non-negative, it follows that 
¢’ is always non-negative. If all parts of the test were equivalent, then all 
correlations defined by (25) would be unity. In such a case, we would have 
¢g’ = 0 in (26) by virtue of the identical vanishing of the parentheses on the 
right. Again, we assume no such equivalence, and we have ¢’ > 0 in general. 
Our practical formulas require no knowledge of the actual value of ¢’. 
To summarize this section, we have defined certain over-all parameters: 
6, a’, 8°, and g’. These will occur in our universal identities for p; , but only 
the first, 6, requires any specifications in practice. 6 is a function only of the 
experimental dependence among the part scores, whereas the other three 
parameters reflect the nonequivalence of the n parts of the test. For a test 
composed of m equivalent parts—which is not to be expected in practice— 


we would have a’ = f” = ¢’ = 0. 


5. Some Fundamental Identities; the First Lower Bound 
One of the most important identities of reliability theory is the following: 
o, =€ +07, (27) 
where the three terms involved are defined by (11), (12), and (17), respec- 
tively. This states that the general variance of the test scores is the sum of 
the variance of “‘true” scores plus the variance of the errors of unreliability. 
Equality (27) follows immediately from definitions (11), (12), and (17), 
with no assumptions whatsoever.* To see this, write the identity: 
tix i (ts, ie T’;) + (T; <n T). (28) 
Squaring both sides and taking expectations over k and 7 yields (27). The 
expectation of the crossproduct of the two parentheses on the right vanishes, 
since 


*This is in contrast to some conventional formulations that need to introduce special 
hypotheses of zero correlations between “true” score and errors in order to arrive at a 
formula like (27). 
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E (tex i T)(T; = 7) = (T; sa 7) E (tix Sea T’), (29) 
k k 


and the expectation on the right is obviously zero. 
Using (27), we can rewrite (13) also as 


2 
p=. (30) 
Co; 


This shows that p; = 0. Also, from (27), we see that «7 < o; . Hence, from 


(30), we have p; S 1. Therefore, we have now proved the previous assertion 


(i= 
that p; , as defined by (13), varies between zero and unity. 
A further identity, proved in a manner similar to that for (27), is as 


follows: 
Vaor; = E Y25425; + Viet; ° (31) 


This follows from definitions (20), (14), and (19), as can be seen by expanding 
the brackets and taking expectations in the identity 
(Tie — &)(Xiiz — &;) 

= [(Xion = X i) + (X.. — &) Mas — Xe) + Xe — é;)]. (32) 


Sum both members of (20) over g, use (1) and (9), sum again over 7 
and again use (1) and (9), and then use (11), to establish the identity: 


a = > y Vrori ° (33) 
g=l 3=1 
Similarly, from summing over (19) it follows that 


or = x Dy Yx.x; . (34) 


=] 7 


Still another identity of the same sort follows from summing over both 
members of (14) and using (6): 


Ot = a iis, . (35) 
g=1 i=1 
Taking expectations over 7 in (35) and using (12) establishes the next 
identity, 
f= Dy LE veces + (36) 
g=1 : 


i=l ¢ 


From (15) and (16), we can rewrite (36) as 


f= }Eoi,, + 5. (37) 


j=l ¢ 
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If 6 is zero in (37), as in the case of experimental independence of the parts 
of the test, then (37) would reduce to a statement that the total error variance 
is precisely the sum of the part error variances. But if 6 ¥ 0, this nonzero 


value must also be taken into account in (37). : 
To transform (37) into a more practical form, we notice that (31) be- 


comes, for g = j, 
o:, = E o:,, + ox; : (38) 


Summing both members of (38) over 7 yields 
-® o:; = he E Toes «3 ee ox; : (39) 
7=1 7=1 i 7=1 

Then, using (39) in (37), we obtain finally the very important identity: 


f= )io2, — > ox, + 5. (40) 


7=1 


Identity (40) gives us immediately a practical upper bound to é: 
és die, +6. (41) 
7=1 


Kach oz, is observable from but a single trial (3, 281, remembering only 
assumption C, is used), and 6 is specified by the law of dependence of the 
given data. For the case of experimental independence, with 6 = 0, (41) 
leads to the lower bound \, to p; as stated in (3). The more general lower 
bound, from (41) and (13), will be denoted by A* , and 


02, +5 
= 


(42) 


i 
=% 
ll 
— 


o, 
Clearly, 
L. (43) 


ae ee 


lA 


6. Further Identities and the Second Lower Bound 


The loss of information which makes \* a lower bound is due to the 
fact that >-"_, ox, is not observable on a single trial. At least. two—experi- 
mentally independent—trials are needed to establish the variation in “‘true”’ 
scores. However, we can improve on \* even from a single trial, by studying 
the sum of the ox, more carefully, as we shall now do. 

We shall establish the following identity: 





—— 


Yok = yg ys + a! + #1 (44) 
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One way of doing this is to rewrite (26) by expanding its right member, 
remembering (25): 


n 


g = ( zs é,) a > ¥ YxoXx; - (45) 


7=1 7=1 g= 


From (23), and then (21), 


n 


n n 1 m 2 
Dy 2 Yeex, = Ts + na? + 1( d o,) (46) 


Using (46) in (45), collecting and transposing terms, multiplying through 
by n/(n — 1), and then taking square roots, yields (44). 

In the right of (44), a and ¢’ are not observable from a single trial. 
But I, is possibly observable. The observability of T', depends—according 
to (23) and (31)—on the observability of the y,,.,; and on E;7,,,.;; . Now 
the y.,:; are observable from a single trial, as shown in (3). The E¥7z,,:,; 
are to have specified an appropriate law of experimental dependence for 
g ~ j. Hence, given the specification, the yx,x; can be observable, and I, 
(also T',) can be observable. We do not need information on a” and ¢’, then, 
but can use the practical inequality that follows from (44): 





n ‘ ™ n 
Yok, 2 ph (47) 


From (40) and (47), then 
é< > ,+6— “7 r, . (48) 
Therefore, if we define A to be 
ee oe 
Af a 1 = at (49) 


2 
CO; 








it must be that A* is a lower bound to p; according to (48) and (13): 


MS om 31. (50) 


This A*% is a generalization of the lower bound ), given elsewhere for the 
case of experimental independence (3), and reduces to the latter when the 
Y2io2i:;——and hence also 6—vanish. 

It is of interest to see how A* and A% can be written in a different form, 
using the notation of I, . Sum both members of (31) over all g and j, and 
remember (33), (15), (16), and (22): 


= (54+ Yeot,) + (+ Dok). (51) 


j=l f 
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Transposing, and using (39), we have 
Yo, +b6=07-T,. (52) 
i=1 


Using (52) in (42) shows that 

















wd, (53) 
Ot 
and using (52) in (49) shows that 
tr, +4/—Ts 
Ay = (54) 
Oo: 
7. The Third Lower Bound; Application to Noncompleted Tests 
Another identity that can be written for >."-, ox, is as follows: 
Yo, = v( = y + n’6? + —"— (na? + ¢’) (55) 
= id n=l n—1 = 
This follows by transforming (24) into 
2 ri 
rT, =n - 16 +0 (56) 


and then substituting (56) into (44). 

Identity (55) enables us to weaken A% in order to save the labor of 
computing I, . According to (55), we have the practical inequality—omitting 
6°, a”, and y— 


Pt, @ (57) 


i=1 n= i 


Notice that it is the absolute value of T, that appears on the right. TI’, can 
be negative as well as positive. But certainly, if (57) is true, then we can 
also remove the absolute value sign, 
- iy 
2 >t. 

> Ox; = i Note 1 (58) 
It turns out that there is no real loss of information in doing so, as far as 
our next lower bound A¥% is concerned; for if T', is negative, no positive bound 


can be obtained in any event. 
Now, from (52), 


Ti=a — oe, — 6. (59) 











LOUIS GUTTMAN 237 
Therefore, if we define our third lower bound to be 


m dor, + 6 
i = Sedmeame mmeed | (60) 


n—1 a; 





then, from (59), (58), (40), and (13), we have 
MS £1. (61) 


A comparison of (60) with (42) shows that \% differs from A% only by 
the factor n/(n — 1), 


n 
n—-1 





Ai. (62) 


t= 


Also, from (53) and (54), we see that if 4 > 0, then 
Ae SAF SAP. (63) 


The best of the three bounds is \% , but the most convenient for general 
use will undoubtedly be \4 . When 6 = 0, A% coincides with the A; of our 
previous paper (3). 

As an application of \% , let us consider the case of speeded or non- 
completed tests. A complete analysis will be given in a later paper (5), and 
we state here only one of the results. If the x;;, are scored only in the range 
between zero and unity (say each part score is based on one item), if un- 
attempted items are all scored zero, and if the only experimental dependence 
among the items is a pure serial relation in the attempts—omitting one 
item implying omitting all the following ones—then it can be shown that 


332 >(Vi-a, xy ve), (64) 
where 7, is the proportion who attempted item g, and &; is again as defined 
in (7). Using (64) in (60) yields a modified lower bound, say \3 , somewhat 
smaller than \% because of the inequality in (64). This provides a solution 
to the problem of studying the reliability of speeded tests, taking into account 
the experimental dependence that occurs. 


8. Summary of the Formulas 


In this paper we have developed two kinds of formulas: identities and 
inequalities. Neither kind has involved any hypotheses or assumptions 
about the data, so that both apply universally. The identities are not usable 
in practice when data are available from only a single experiment, but the 
inequalities are. 
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It is useful to assemble the identities into direct formulas for p; . One 
such formula is 








: 2,342 (0, +0? +0) 
p=1— an 2 ‘ (65) 


Cor 





This follows from (13), (40), and (44). Using (59), we can also write (65) as 





| n 2 2 
Tt yp aa te tne +o) 


2 
Or 


2 


‘eal 





(66) 


In (65) and (66), only a’ and ¢g’ are necessarily unobservable in a single 
trial. They depend on the equivalence of the part scores of the test. Identities 
(65) and (66) show the exact role of nonequivalence for reliability; absence 
of knowledge of the actual values still leaves possible universal lower bounds 
to p; . 

Another identity for p; can be written as follows: 





Tr, s 292 : : 2 
a f = 1 tue +e te 





pi a 2 (67) 
Ot 
This follows from using (56) in (66). It may be helpful to write again here 
the working formula for I, : 


Tl. =o— doo: — 4. (59) 
i=1 


While the quantity 6° in the right of (67) may be observable, being based 
on I, and I, , the cumbersome calculations may be omitted in the practical 
inequalities, at the expense of some weakening of the inequalities. T, does 
not appear explicitly in (67). 

Omitting the (nonnegative) radical in (65), (66), or (67) leads to lower 
bound d* : 

o:, + 4 

1 


w= 1-+ (42) 


Ot 


Omitting only the (nonnegative) terms involving a’ and ¢’ in (66) leads to 
the second lower bound: 


Oe «4 
2,0, + 3 - ‘“e-i* 


Wwe l-— 


(49) 


Or 
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Omitting the (nonnegative) terms in ’, a’, and ¢’ in (67) leads to the third 
lower bound: 
a -, + 6 


ie ace ee 
et : a; : (60) 





An important special case is when n = 2, or the test is divided into 
two parts. This leads to “split-half”’ formulas. When n = 2 then 8’ is iden- 
tically zero; there is only one ‘‘true”’ covariance, yx,x, , So there is no variation 
among “‘true” covariances. For this special case, (66) and (67) become 








2 A2 2 
so DtVvVnpW te w= 9, (68) 
where now 
T, =o; — o:, — 02, — 6, (n = 2), (69) 
and 
6 = 2 E Ysis2is , (n aa 2). (70) 


In this special case of n = 2, if I, = 0, we always have A% = A¥, according 
to (49) and (60). Because of the importance of this joint special case of 
A* and A¥% , we shall label its lower bound separately, as \% , where 


2 2 
t= 1 — tt 8) (n = 2), (71) 


6 being defined as in (70). A% generalizes the “‘split-half’’ bound A, , defined 
elsewhere (3), to the case of experimental dependence. 
In the lower bounds defined in (42), (49), (60), and (71), respectively, 
o; and each o:, are observable from but a single trial, as proved elsewhere 
in (3). 6 and [,, depend on the law of experimental dependence among the 
part scores; different laws will give these different working formulas. The 
present general formulas show precisely how and where experimental de- 
pendence enters the problem, no matter what the nature of the data or the 
law of dependence. 
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NOTE ON MILLER’S “FINITE MARKOV PROCESSES IN 
PSYCHOLOGY” 


RicuHarp C. W. Kao 
UNIVERSITY OF MICHIGAN 


In his article ‘Finite Markov Processes in Psychology,’* G. A. Miller 
derived a least-squares “estimate” for a matrix of transitional probabilities. 
However, the mathematical proof is found to be invalid. 

On page 158, Miller defined N by the equation 


N=N+0C, (19) 


“where the elements of the matrix C' are the corrections that must be added 


to the observed values in N to give the best estimate NV.” He wished to de- 
termine 7’, the “best” estimate of the transformation. From the definitions 
of T, M, N, N, and C, he argued that the following equation holds: 


™M=N=N+C. (20) 


It is clear from this equation that 7 would be “best” in a trivial sense if C 
is assumed to be the zero matrix, i.e., N = N. We shall show that Miller had 


in fact derived only this trivial estimate by means of his undefined math- 
ematical techniquef. 


From equation (20) Miller obtained another expression for C: 
C= -N+TM. (21) 


For a least-squares solution, he argued that CC’ must be a minimum. But 
this minimum cannot be obtained by simply ‘“--- putting the partial de- 
rivative with respect to 7' to zero:” 


= CC’ = MC’ = 0, (22) 


for the operation of differentiating a function of a matrix with respect to the 
matrix has not been defined at all in this connection. It is obvious from equa- 
*Psychometrika, 1952, 17, 149-167, 
tFor a valid mathematical proof of a least-squares estimate in this connection, see 


Goodman’s “A Further Note on ‘Finite Markov Processes in Psychology.’ ” This issue, 
245-248. 
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tion (21) that C’ is as much a function of 7 as C. Hence, in differentiating the 
expression 
CC’ = (—-N + TM)C’ = —NC’ + TMC’, 


one cannot assert that d/ aT (— NC’) = 0. For then one would be asserting 
dT’/dT = 0, a result which is inconsistent with the undefined operation 
0T/dT = 1, in the case of symmetric matrices where one clearly has T’ = 
T.* Hence, the first equality in equation (22) cannot be meaningful.t 

The second equality in equation (22) in effect requires C to be a zero 
matrix. For in order that MC’ = 0 for whatever M, C’ or C must. be zero. 
Granted that zero-divisors (i.e., AB = 0 for A + 0 and B ¥ 0) are possible 
in dealing with matrices or rings in general, it is still true that the second 
equality in equation (22) holds only if C vanishes identically. This is so 
because no restriction has been placed upon M except that it be of order 
a X (n — 1), and consequently, one can choose a matrix M with all positive 
elements such that MC’ = 0 only if C’ = C = 0. In view of the fact that 
equation (22) is asserted to hold in general, we conclude that it does only if 
C is the zero matrix, from which the tautological nature of Miller’s argument 
becomes clear. 


*This argument is valid whether one chooses to use matrix or scalar notation for 
differentiation. Private communication with Professor Miller shows that he does the latter. 
In fact, he reasons that there are (n — 1) equations in (n — 1) + 2 unknowns in C and 7, 
and the remaining two equations are obtained from setting the partial derivatives of 
221 c? with respect to f, and % to zero: 


n- 


| ® 
3 
1 
e 


2 
C; 


=23e—=0, (j= 1,2); 


Q 
~ I 
1 

Q 
a 


a 
whence /C’ = 0 on substituting {0c;/di;} by M. It is unclear why Professor Miller chooses 
one particular element in a matrix in his “matrix differentiation” and concludes that the 
whole matrix has thereby been minimized. In the case of two alternatives, various special 
assumptions lead to a matrix C of the form 








~ 


| n—1 n—1 || 
| De —Leell 
I} i=1 i=1 

| n—1 . n—-1 ‘ I 
[2 el 


If one minimizes, as Miller does, the upper left element with respect to Z; ,7 = 1, 2, is one 
not simultaneously maximizing the upper right element with respect to the same thing? 
The fact remains that the elements of the matrix C are so functionally dependent on each 
other as not to permit the peculiar differentiation -used by Miller. For a least-squares solu- 
tion, it is sufficient to require only the elements on the principal diagonal of the matrix CC’ 
to be a minimum. But this interpretation is a far cry from asserting that the whole matrix 
CC’ is minimal. Indeed, the exact meaning of Miller’s argument that CC’ be a minimum is 
unclear. 

{We note here the distinction between Miller’s matrix differentiation and that of 
Dwyer and Macphail, Symbolic matrix derivatives. Ann. math. Statist., 1948, 19, 517-534, 
esp. 523, 528-530. 
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Apart from all these considerations, Miller went on to substitute his 
equation (21) into equation (22) and obtained the expression 


M(—N + TM)’ = —MN’ + MM'T’ = 0. 
By rearranging terms, he got what he called the ‘“‘best’’ estimate of 7: 
T = NM'(MM’)". (23) 


In case M, N are non-singular matrices—Miller’s assumption that they be of 
order a X (n — 1) does not, of course, prevent a from being (n — 1) —or 
M, N have inverses, we may show the tautological nature of Miller’s argument 
by a different method.* We proceed to simplify equation (23) as follows: 


T = NM’(MM’) = NM’(M’)'M" = NIM™ = NM” 


or 
TM = N, 


which shows again that in equation (20) Miller had assumed C = 0 or N = N. 
In case M, N are singular, this second proof would not apply; but the first still 
would. On the other hand, neither could Miller capitalize on the irrelevant 
fact that M, N are singular to prove the validity of his results. We have here 
something which is essentially a mathematical identity, the validity of which 
is independent of the choice of M and N. Hence, in order to show that equa- 
tion (23) does not hold in general except in the trivial sense, it is sufficient to 
produce one counter-example where M, N are non-singular. For the logical 
denial of a proposition which reads, “for all xz, P(x) is true” is that “there 
exists one x such that P(z) is false.” 

As a casual remark we note that setting the partial derivatives with 
respect to a variable to zero is only a necessary and not a sufficient condition 
for obtaining a minimum. For the latter, the second-order condition cannot be 
ignored. Granted that the experimental interpretation of CC’ is such that a 
maximum is unlikely or even impossible, there is no assurance, on the other 
hand, that the stationary value obtained from using only the first-order 
condition is extremal at all. The mathematical problem of minimizing quad- 
ratic forms in general is not as simple as one may presume. 

Finally, it is to be noted that the contention in this note refers only to 
Miller’s ‘‘mathematical proof” of his best estimate for the matrix of transi- 
tional probabilities and not to his “psychological interpretation” of finite 


Markov processes. 


Manuscript received 11/19/52 
Revised manuscript received 4/4/53 
*We note that this argument does not apply to Goodman’s results, where M, N are 


column vectors. 
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A FURTHER NOTE ON “FINITE MARKOV PROCESSES IN 
PSYCHOLOGY’’* 


Leo A. GoopDMAN 
THE UNIVERSITY OF CHICAGO 


In his interesting article ‘Finite Markov Processes in Psychology,” 
G. A. Miller derived a “least-squares estimate’ for a matrix of transitional 
probabilities (1). However, the mathematical proof seems to be unclear. 
Since this proof is considered invalid (2), we shall present a somewhat clearer 
version of the proof of this result. We shall also examine the general problem 
in some detail. 

In the proof we shall assume that the reader is familiar with matrix 
notation, which enables a considerably shorter presentation: We shall follow 
the matrix conventions and the terminology adopted by Miller (1). 

Let m, (¢ = 1, 2, --- , a) represent the observed distribution on the 
kth trial (k = 1, 2, --- , n). That is, m,, is the proportion observed in the 
ith alternative quantity on the kth trial. There are a alternative quantities 
and n such trials. Let 


My Me °°* Mi ,n-1 

Mo, Moz *°* Meayn-1 

M ={|M3, Msp *°* Me a-i 
Mai Maz Ma .n—1_] 








be the a X (n — 1) matrix formed by placing in successive columns the 
distributions observed on successive trials, from trial 1 through trial n — 1. 
Following the notation adopted by Miller, we let ¢;; be the transitional 
probability that an observation which is in the jth alternative quantity 
(j = 1, 2, «++ , a) at a given trial, will be in the 7th alternative quantity 


*This report was prepared in connection with research supported by the Office of 
Naval Research, 


245 








246 PSYCHOMETRIKA 


(¢ = 1, 2, --- , a) on the following trial. We define the row vectors T; = 
[t;:, tie, °** , ts2] and the a X a transformation matrix 





i ee tie ssa i] 
toy too toa | 
T = ts tse t3q | 
| 
Ltaa tao hiss 
We also define the row vectors N; = [m,2, m3 , Mia +++ , M,,] and C; = 


T;M — N,; . The problem is then to determine a matrix 





tia tie an tia 
bei tes gui 4. 

T= | tsi 32° °° egg |. 
| 
| 
bas Ss ze 
L tat too hse 4 

such that C;C{ is minimized for all values of ¢ (¢ = 1, 2, ++, a) when T is 


taken equal to 7’. By the usual proof in the theory of least squares (cf. [3], 55), 
we see that C;C% is minimized when 7’; is taken equal to 


T; = N;M'(MM’')* (if MM’ is nonsingular). 
Hence, the “‘least-squares estimate’’ is 
7, | 
| T | 
T=(|T; 


which is, in fact, 


T = NM'(MM’)"’, 


where JN is defined as 
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lV, 
Nz 
N — | N3 je 








Ly. Q.E.D. 


We might wish our estimates 7 of the parameter T to have some of the 
same properties as the parameter 7’. For example, it ‘may sometimes be 
desirable to require [1]7’ = [1], since we have that [1]7 = [1], where [1] is 
the a-dimensional row vector [1, 1, --- , 1]. We shall now prove that the 
“Jeast-squares estimate” 7’ has this desirable property. 

TuHEoREM: We have that [1]T = [1], where T = NM’(MM’)™. 

Proor: From the definition of N, we see that 


(JN = [1]. 
Hence, it is sufficient to prove that 
(1]’(MM’)™ = [1], 
or 
[1jM’ = [1]MM’. 
From the definition of M, we have that [1] = [1]. Hence we have that 
[1JMM’ = [1]M’. Q.E.D. 


We might also have obtained these results using the general regression 
methods presented by S. S. Wilks (4). The problem is that of estimating 
a X a parameters which are subject to a linear restraints. We shall be inter- 
ested in minimizing >—3_, C; C’ in order to obtain the “least-squares estimate.” 
In other words, we wish to estimate a parameter which is a point in an a(a — 1) 
dimensional space. Since ¢;; > 0, the parameter will lie in a subset of this 
space. If we also wish our estimate 7’ to lie in this same subset, the method of 
estimation is still quite straightforward but sometimes tedious. We first 
obtain the “least-squares estimate” 7’. If this estimate lies in the subset 
(i.e., t;; > 0), then 7’ is used to estimate 7’. If 7’ is not included in the subset, 
then the appropriate estimate will lie on the boundary of the subset. We 
then use that estimate on the boundary of the subset which minimizes 

in C, Ch. 

The numerical examples in (1) illustrate how this result is used in a 
learning experiment in a T-maze. As the author himself has pointed out, the 
least-squares fit described in (1) is not most efficient for Markov processes. 
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If the observed transitional proportions are available, they would clearly be 
more appropriate in the estimation of transitional probabilities. 
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A PROPOSED INDEX OF THE CONFORMITY OF ONE 
SOCIOMETRIC MEASUREMENT TO ANOTHER* 
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An index is proposed to measure the extent of agreement of the data of a 
sociometric test with another test made at an earlier time or on another test 
criterion. The index is used to define an index of concordance between the two 
tests. It is shown how the index may be used for either individuals or groups. 
Tests of the hypothesis that agreement is random are given for all cases and 
applied to an example. 


1. Introduction. Whenever two or more (sociometric) measurements are 
made of the interpersonal relationships among one group of individuals, 
questions immediately arise concerning the extent to which one set of measure- 
ments conforms to another. Examples of this kind abound in the literature; 
we will note only a few. 

A second measurement on the same group invariably raises the intriguing 
question of how much the pattern of relationships observed in the earlier 
measurement has persisted in the later. Further (and in a more fundamental 
sense), what is the nature of this persistence? An example of this kind of 
data was presented to the authors by Dr. Hilda Taba of San Francisco 
State College. In the course of an extensive study of a class of 25 children, 
she asked each of them, at intervals of about two months, to name the three 
others they preferred to be seated with, in smaller groups. The resulting 
series of measurements provide information on the persistence of choice 
patterns and, in particular, data to test the hypothesis that the persistence 
phenomenon is stationary, i.e., dependent only on the time interval between 
a pair of measurements. 

In the study discussed above, the subjects were also asked at one point 
to name those others with whom they would like to work on a specific activity, 
namely, mathematics assignments. Noting some conformity of the special 
choice patterns to the general, we ask whether this is a random effect, and, 
if not, what is the order of the excess over chance conformity? Similar ques- 
tions might be raised for other special situations; in this case, the relative 


*Work done under the sponsorship of the Office of Naval Research. 
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excesses could be used to determine (inversely) how “special” each situation 
is, in comparison with the others. 

As a final example, and the one to which we shall return for illustration 
because of the smaller numbers involved, we consider the technique in which 
each individual is asked to give, first, his choices among the others, and, 
second, his guesses as to which of the others will choose him. Here, we are 
usually concerned with whether, and to what extent, the perceived choice 
configuration conforms to the actual. Also, in this case to a greater measure 
than in the preceding, we are interested in the variable accuracy of perception 
among the individuals in the group. 

In each of the last two examples, we encounter the usual confusion of 
“independent” and “dependent” variables, with neither bearing a causal 
relationship to the other. In the first example, with measurements in a natural 
time sequence, there is no possibility of confusion of priority. We shall return 
to this question in section 5. 


2. The index. For a group of n individuals there are n(n — 1) ordered pairs. 
Our basic information specifies, for each ordered pair, whether (a) neither 
relation exists, (b) relation X exists but not Y, (ce) Y exists but not X, or 
(d) both X and Y exist. Following custom, we take relation X to be the 
prior or ‘‘independent”’ relation; Y the posterior or ‘‘dependent”’ relation. 
The generic question we ask is: To what extent does the occurrence of relation 
Y in the ordered pairs conform to the occurrence of relation X? In any 
specific instance, our data may be summarized as in the fourfold distribution 
of Table 1. 


TABLE 1 


Joint Distribution of Occurrence of X and Y for a Group of n Persons 


y y Total 
X Ney Neg nN, 
X Ney Ney Nz 
Total Ny Ny n(n — 1) 


In Table 1, n,, (for example) represents the number of ordered pairs 
in which relation X occurs and relation Y does not occur from the first 
individual to the second of the pair; n,, equal to the sum of n,, and nz,, is 
the number of pairs having relation Y without regard to occurrence or non- 
occurrence of X. In this form, it is well known that questions of dependence 
or concomitant variation depend on the four numbers in the body of the 
table and that the marginal totals provide only a frame of freedom for the 
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body of the table when the marginal totals are considered to be known a 
priori. Hence we may take any one of the numbers in the body of the table, 
say n,, , as the essential variable. Lastly, n,, is known to possess the hyper- 
geometric probability distribution under the hypothesis that X and Y are 
independent, or, in our terms, that occurrence of Y does not particularly 
conform to occurrence of X. 

We now proceed to construction of an index of conformity (T) having 
the following three properties: 


(1) T = O when X and Y are independent, 

(2) T = 1 when occurrence of Y conforms exactly to occurrence of X 
(note that we say nothing about non-occurrences of Y), 

(3) TI’ is estimated by a linear function of n,, . 


I 


We first note that, when X and Y are independent, E(n.,) = 
n,n,/n(n — 1). Since we know that n,, cannot be less than zero nor greater 
than n, , we define I’ by the following expression for the conditional expected 
(mean) value of n,, : 

E(n., | T) = aa (n, + In,). (1) 
We observe that E(n,, | T = 0) = n,n,/n(n — 1), which is precisely the 
condition for statistical independence of X and Y. Secondly E(n,, | T = 
1) = n, , ie., every ordered pair which has relation X also has relation Y 
and the conformity is complete. I is a linear function of E(n,,); therefore, 
we take for our estimate of I the solution of (1) with H(n,,) replaced by the 
observed n,, . This gives 
~ 1 
l= ne [n(n — 1)n., — nn,]. (2) 

Equation (1) expresses that the expected value of n,, depends upon 
the underlying parameter I, which may take any value in the interval 
[—(n,/n,), 1]. In most applications, n, is smaller than n, . Since the estimate 
of equation (2) has the appropriate conventional expected values of zero 
when Y does not conform to X and unity when conformity is perfect, we 
may (and shall) take the estimate to be our index of conformity. 

One advantage of this choice of index is that we have immediately an 
unbiased estimate of the conceptual underlying parameter. Another ad- 
vantage, we shall see, is that IT lends itself to simple standard tests of the 
hypothesis that = 0. Still another advantage, not insignificant from the 
standpoint of the practitioner, is the ease of computation of this index. 


3. T'est of the hypothesis of absence of conformity. In most cases in practice, 
we will believe on intuitive grounds that there actually is some degree of 
conformity present and shall desire only to estimate that degree. Nevertheless, 
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it is logically necessary that we establish that our belief is well founded. 
Accordingly, we give in this section a test of the hypothesis that T = 0. 
It was shown by Katz (2) in 1941 that a “best” test for T = 0 against alter- 
natives I > 0 (in the likelihood-ratio sense) is given by the upper tail of the 
hypergeometric distribution and that a “best unbiased” test against T ~ 0 
is given by two tails of the same distribution so chosen that the mean value 
of the tails is equal to the mean of the entire distribution. (Note that, aside 
from choice of critical regions, this is the well-known Fisher “exact” test 
for the four-fold table.) 

Thus, whenever n(n — 1) is small, we may test the hypothesis exactly, 
using the recent tables given by Finney (1). When, as is more likely, n(n — 1) 
is large and each of n,, , Nz; , Nz, and n,, is large enough (say, >2), the x’ 
approximation is adequate. A simple computation gives 


2 _ n(n — Inn #2 
a nn, ay, (3) 


with one d.f. Even simpler is 





_ (n(n = Dany «, 
ea V NN, " 7” 


which is approximately normally distributed about zero with unit variance. 
In case n, = n, , (3a) reduces approximately to z = nI and the hypothesis 
is rejected at the 5% level whenever | [| > 2/n, approximately. 

As always when the x’, or the equivalent z, approximation to the exact 
test is made, one should make the Yates correction, consisting of either 
adding or subtracting one-half unit from n,, so as to decrease the absolute 
value of I. 


4. The index for an individual. One is often in the position of wishing to make 
tests and estimates, similar to those described above, for the individual 
members of the group. In these instances, one asks, ‘“To what extent does 
an individual’s pattern of (outgoing) choices persist to a later time or to 
another criterion of choice?” or ‘“To what extent is the set of an individual’s 


TABLE 2 
Joint Distribution of Occurrence of X and Y for a Single Individual 


f Y Total 





x Nz, (2) Nzg(t) n,(t) 
Xx Ney (1) Ng (2) nz(t) 


Total ny (2) Ng (t) n-1 
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incoming choices unchanged?” In either case, our data are the observations 
on (n — 1) ordered pairs and may be exhibited as in Table 2. 

In this table, entries are interpreted as in Table 1 except that it is neces- 
sary to note that the entries in the body of the table and the marginal totals 
are those for the 7th individual. Everything goes through exactly as in §2 
and we obtain the index of conformity for the 7th individual, 


Ui) = ray Men — Dra) — nns(0). (4) 

The tests of §3 hold as before with [ replaced by y(i), n(n — 1) by 

(n — 1), and the marginal totals by the corresponding totals for the ith 

individual. Here we shall usually require the exact test and, for very small 

groups, the test may break down in the sense that we are unable to reject 

the hypothesis of lack of conformity whatever be the value of n,,(z). We 

shall observe this in the example of §6, in which we deal with a group of 
ten persons. 


5. The ambiguous case. In many situations, it is not possible to identify 
one relation as antecedent or “independent” with respect to the other. In 
this case, we have exactly the same problem which arises in any regression 
analysis when we are uncertain as to which regression coefficient is meaningful. 
Accordingly, we shall use exactly the same device for resolving the difficulty. 
We define a coefficient of concordance as the geometric mean of the two in- 
dices of conformity of X with Y and of Y with Y. Thus, we obtain 
aa rf. a [n(n — Ins, — nen) (5) 
N~NyNyN; 
where C is the coefficient of concordance. We should attach to C the algebraic 
sign of the factor in square brackets since this is the sign of both indices 
of conformity. 
We observe that C? = x’/n(n — 1), the mean square contingency of 
Table 1. [See, for example, Kendall (3), 318-319.] Finally, we note that the 
test of significance for C is exactly the same as for either index of conformity. 


6. An example. We shall consider data collected and kindly made available 
to us by Dr. Renato Tagiuri, of the Harvard University Laboratory of 
Social Relations. Dr. Tagiuri asked each of a group of ten graduate students 
first, ‘Which members of the group do you like most?’ and second, ‘“‘Which 
members of the group do you feel like you most?” In this situation, armchair 








philosophizing can construct a case for either argument (a) that these people 
are fairly sophisticated and, hence, perceive relationships accurately or 
argument (b) that these people are fairly sophisticated and, hence, conceal 
their feelings so that relationships cannot be perceived accurately. Obviously, 
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there is a need for objective measurement of agreement. Equally obvious 
is the ambiguity of the situation, for it is difficult to decide whether we are 
primarily interested in how well perceived relationships conform to the 
actual ones or in the obverse conformity. Since, fortunately, resolution of 
this dilemma is not the purpose of this paper, we shall assume that the second 
question is our primary concern although we will make both computations. 
The data appear in Tables 3 and 3a. 


TABLE 3 
Positive Choices Expressed by Ten Individuals 


1 x x x 
x 2 x x 


x 3 





TABLE 3a 
Positive Choices Perceived by Ten Individuals 


x x 


x x 
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In both tables, the individuals’ responses appear in rows. Thus, the 
first individual chooses (in Table 3) the second, third, and fifth and feels he 
will be chosen by (in Table 3a) the second, fifth, and tenth. For the ith in- 
dividual, we are concerned with how well the choices he actually receives, 
the ith column of Table 3, agree with the choices he thinks he will receive, 
the ith row of Table 3a. We first consider individual conformity; the results 
of these computations are summarized in Table 4. 


TABLE 4 
Conformity of Actual to Perceived Choices of Individuals 














Individual (i) Mzy(t) Pr {n,,(t) or more} (7) 
1 3 .12 1.00 

2 3 .36 .56 

3 1 .42 43 

+ 2 ae 1.00 

5 3 .12 1.00 

6 0 1.00 .00 

a 1 .58 .24 

8 1 .22 1.00 

9 3 .048* 1.00 

10 2 . 083 57 

*Significant at 5% level. 


In the case of a small group, such as this, it is difficult to obtain sharp 
tests of significance for individuals. Thus, while there is perfect agreement 
with the choices of five individuals, in only one instance (9) can we reject 
at the 5% level the hypothesis that agreement is by chance alone. We recom- 
mend, therefore, whenever it is necessary to examine individual conformity 
in small groups, that each individual be asked to name approximately half 
the group as those most likely to choose him in order to make the tests as 
sharp as possible. 

The story is quite different when we measure group agreement; here 
our observations are adequate to construct reasonable tests. Our data and 
computations are summarized in Table 5. 

For a group of ten, the approximate test of §3 indicates significant (5% 
level) departure from random agreement when | C | > .20; the degree of 
association is measured by | C |, where C has the force of a correlation co- 
efficient. In fact, if X and Y are interpreted as variables taking values 0 
and 1 only, C is the correlation coefficient. 

It may be worth while (from the psychologist’s point of view) to note 
that the individuals who took part in this experiment were asked the same 
two questions with respect to rejections, or negative choices. For these 
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TABLE 5 
Conformity of Actual to Perceived Choices of Entire Group 








Actual Choices (Y) 











Perceived " 
Choices (X) Y Y Total 
x 19 8 2 
X 12 51 63 
Total 31 59 90 





fT, (conformity of actual to perceived) = .55, 
ft. (conformity of perceived to actual) = .45, 
C (concordance between actual and perceived) = .50. 


data, the concordance index, C, was .14, not significantly different from 
zero. We might conclude, therefore, that this group is able, to a limited 
extent, to perceive positive feelings but seems practically unable to discern 
existing negative feelings. 
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DIFFERENCES IN FACTOR CONTENT 
OF RIGHTS AND WRONGS SCORES* 


BENJAMIN FRUCHTER 
THE UNIVERSITY OF TEXAS 


The right-response scores and wrong-response scores of speeded aptitude 
tests were factor analyzed to determine whether they differ in factorial 
content. The information thus obtained was used to derive scoring formulas 
that yield purer measures of a factor than do scoring formulas derived in 
other ways. 


I. Introduction 


Different formulas are used to score tests or other measuring devices 
for different purposes. One formula may be used to correct for guessing, 
another for maximizing the reliability, and still a third for maximizing the 
validity of a test for a specific criterion. As has been pointed out by Thurstone 
(6), application of these scoring formulas will affect the correlations of timed 
or speeded tests only. For tests in which every item is attempted and scored 
either right or wrong, the correlation between the number of right and the 
number of wrong responses is — 1.00. Consequently scores based on a formula 
that corrects the number of right responses on an untimed test by some 
function of the number of wrong responses will have the same correlations 
with other tests as scores based on the number of correct responses only. 

For tests administered under time-limit conditions, however, the wrongs 
scores may be relatively independent of the rights scores. The data in Table 
1, based on results of tests administered to aviation students during World 
War II, are exhibited to illustrate that the relatively independent wrongs 
scores derived from time-limit tests can have reliabilities and validities 
comparable to those of the rights scores. 

It was hypothesized that the low correlation between the right and 
wrong responses of some speeded tests might indicate that different functions 
were being measured by these two types of scores. For example, the probability 


*This paper is a revision of a dissertation submitted in partial fulfillment of the 
requirements for the Doctor of Philosophy degree at the University of Southern California, 
1948, and a paper read to the American Psychological Association in September, 1949. 
The writer is greatly indebted to Dr. J. P. Guilford for providing the intercorrelation and 
distribution data from the files of the School of Aviation Medicine and for general guidance 
throughout the study. He also wishes to express his appreciation to T/Sgt. Frank C. Ivens 
and Sgt. James R. MacDonald for computational assistance in the extraction and rotation 
of the centroid faetors, which were performed while the writer was a civilian employee of 
the Air Training C ommand, Human Resources Research Center. The opinions expressed 
are those of the writer and do not necessarily reflect the official views of the USAF. 
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TABLE 1 
Correlations Between Rights and Wrongs Scores, Reliabilities, and Validities of 
Time-Limit Tests Administered to Aviation Students* 





Alternate- Validity Training 
Test rrew Forms Reliability Coefficients Criterion 





Rights Wrongs Rights Wrongs 





Map Distance — .07 3060 —.. 18 navigator 
Position Orientation —.10 18 —.19 pilot 
Object Recognition — .22 84 .76 .26 —.07 pilot 
Estimation of Length + .25t 65 2 13 —.01 pilot 
Visualization of Maneuvers, C —.35 34 —.20 navigator 
Spatial Visualization, IT — .66 30 —.27 navigator 





*Data obtained from Guilford (3). 
{This unexpected positive correlation may be due, in part, to faulty reproduction of some of the test 
items. 


of arriving at the wrong answer to an item might be related to differences in 
numerical ability, whereas the probability of arriving at the right answer 
to the same item might be related to differences in reasoning ability and be 
unrelated to differences in numerical ability. 


II. Background 


Since rights and wrongs scores are relatively independent in some speeded 
tests, and both types of scores seem to contain reliable, valid variance, it 
would be of some interest to determine the nature of the differences between 
these two types of scores. Several studies have indicated the differing content 
of the rights and wrongs scores of a test. 

When the items of the Map Distance Test (3, 458-461), which had been 
administered under time-limit conditions to a sample of aviation cadets, 
were analyzed on the basis of the highest and lowest 27 per cent of the total 
scores obtained from the scoring formula S = R — 3W + 40, the mean phi 
(.32) based on the total number who had answered each item was higher 
than the mean phi (.10) based on the total group taking the test. These 
results are the reverse of what is usually expected from a highly speeded 
test, for which the phi’s based on those who answered an item usually would 
be near zero, whereas the phi’s based on the total group taking the test should 
regularly increase, from the last item which everyone had managed to reach, 
to the end of the test. As the speed element in a test decreases in importance, 
the discrepancy between phi’s computed on the two bases decreases until, 
with a pure power test, it disappears. 

An item analysis, performed against the criterion of total rights only on 
the Map Distance test, yielded a mean phi based on the total answering each 
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item of .18, and a mean phi based on the total group taking the test of .31. 
Apparently the rights score is a speed score, whereas the formula score, 
heavily weighted for wrongs, is a power score. 

Further evidence that wrongs scores may measure functions not found 
in rights scores is obtained from the analysis of tests designed to measure the 
trait of carefulness (2, 3). When four tests designed to measure carefulness 
were administered to a sample of aviation students, it was observed that 
large numbers of wrong responses were made and that the distributions of 
error scores had considerable range and variability. The correlations between 
the rights and wrongs scores of these tests were treated as separate variables 
and correlated with the formula scores of a number of other tests. Factor 
analysis of these correlations revealed a new factor, uniquely characteristic 
of the wrongs scores of the carefulness tests. Had the error scores not been 
analyzed separately, no factor resembling carefulness might have been 
found in these tests. 


III. Selection and Correlation of the Variables 


In order to determine the differential content of the rights and wrongs 
scores of time-limit tests and to estimate the factor content of various scoring 
formulas combining these two types of scores, a battery of forty-five ex- 
perimental tests was administered to a sample of unclassified aviation students 
under time-limit conditions. The sample consisted of 8,158 male, unclassified 
aviation students, mostly 18 and 19 years of age, and of average or above- 
average intelligence. Separate rights, wrongs, and formula scores were 
obtained for each test, and the product-moment intercorrelations for each 
type of score were calculated. Because of the length of the battery no examinee 
took all of the tests, and the N’s for the correlations vary from 385 to 1,558, 
with a median N of approximately 450. Each examinee also had taken a 
battery of twenty-one classification tests, and the correlations of the formula 
scores of each of these tests with each of the three types of scores (rights, 
wrongs, and formula) of the experimental tests were available. It was desired 
to select for analysis those tests whose wrongs scores were relatively in- 
dependent of their rights scores and also reliable. Although the correlations 
between the rights and wrongs scores and reliabilities of those scores were 
not available for this sample, they were available for some of the tests on 
comparable samples, and were used as a guide in selecting variables for the 
analysis. Only those wrong scores whose distributions had sufficient vari- 
ability to justify confidence in the stability of their correlations with other 
scores were selected for analysis. Twenty-four experimental tests were judged 
to have sufficient variability, independence, and reliability in their wrongs 
scores to merit analysis. To these were added the a priori formula scores of 
seven classification tests of known factor content to help guide the rotations 
and interpretations. The score distributions were normalized. A matrix con- 
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taining the intercorrelations of the wrongs scores of the twenty-four experi- 
mental tests (with axes inverted to yield positive correlations with the 
classification tests) and the formula scores of the seven reference tests was 
obtained. Another matrix, consisting of the intercorrelations of the normalized 
rights scores of the same twenty-four experimental tests and formula scores 
of the same seven reference tests, was assembled to be analyzed for com- 
parison. Since the experimental tests were administered in overlapping sub- 
batteries, the N’s for the correlations vary*. It was considered justifiable to 
include the correlation coefficients from the various overlapping samples in 
one matrix for factor analysis since the aviation-student population is quite 
homogeneous and individuals were randomly assigned to the various sub- 
batteries. 


IV. Analysis of the Data 


The correlation matrix for the twenty-four rights scores (variables 1 to 
24) and seven formula scores (variables 25 to 31) was analyzed by the centroid 
method to nine factors, and the factors were rotated to meaningful positions. 
The resulting centroid and rotated factor loadings are shown in Table 2. 
Similar data for the nine factors that were extracted from the wrongs scores 
are shown in Table 3. 

The first eight rotated factors were interpreted alike in both analyses 
and are as follows: 


I. Spatial orientation 
II. Visualization 
III. Associative memory 
IV. Numerical facility 
V. Verbal comprehension 
VI. Reasoning 
VII. Visual memory 
VIII. Length estimation 


The ninth factor in the rights analysis was considered to be a residual, 
the loadings ranging from —.20 to .18. The ninth factor in the wrongs analysis 
is a triplet, and although its nature is not obvious, it would be of interest to 
identify it, since it may represent a function unique to error scores. The tests 
with the highest loadings on this factor previously have appeared on factors 
labeled “sequential reasoning” and ‘‘integration IIIT” (3, 824 and 833). If the 
former interpretation should prove to be the correct one, it is probably related 


*To reduce printing costs the tables containing the correlation matrices and the sizes 
of the samples on which the correlations were based have been deposited with the American 
Documentation Institute. Order Document No. 3954 from American Documentation 
Institute, Auxiliary Publications Project, Photoduplication Service, Library of Congress, 
Washington 25, D. C., remitting $1.25 for microfilm (images 1 inch high on standard 35mm. 
motion picture film) or $1.25 for photoprints readable without optical aid. 
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to the reasoning-by-elimination which is sometimes used to select the answers 
to multiple-choice items. 


V. Discussion 


As might be expected from their low correlation, the factor content of 
the rights and wrongs scores of some tests is quite different. Thus, for the 
Planning Air Maneuvers test, the principal loading for the rights score is on 
the reasoning factor, and for the wrongs score on the visualization factor. 
For the Visualization of Maneuvers test, the rights have a higher loading 
than the wrongs on the spatial-orientation factor, while the wrongs are more 
heavily loaded on the visualization factor. 

Similarly, for several of the factors, some of the loadings in the wrongs 
analysis were higher than the corresponding loadings in the rights analysis. 
This seems to be especially true of the visualization factor, on which twelve 
out of fourteen experimental tests had higher loadings for their wrongs scores 
than for their rights scores. For tests such as Planning Air Maneuvers, Map 
Distance, and Position Orientation, reasoning, length-estimation, or numerical 
abilities may have been used in arriving at the correct answers. Apparently, 
wrong answers frequently were arrived at because of incorrect visualization. 
The higher loadings of the wrongs scores may indicate that they are good 
measures of the visualization factor. Other factors that have higher loadings 
for the wrongs scores of some tests than for the corresponding rights scores 
are memory, number, and reasoning. Presumably for these tests the variance 
of the wrongs scores is more dependent on individual differences on these 
factors than is the variance of the corresponding rights scores. 

The question of whether the loadings of the formula scores can be pre- 
dicted from the weighting of the rights and wrongs in the formula might also 
be raised.* 

Unfortunately, all of the required data were not available for this sample, 
but the necessary values (which in addition to the loadings are the standard 
deviations of the rights and wrongs scores and the correlations between the 
rights and wrongs) were available on comparable samples for some of the 
tests. The results of the attempt to estimate the loadings of the formula scores, 
on certain factors, from the loadings of the rights and wrongs scores on these 
factors, by means of the formula for the correlation of a weighted sum with 
a third variable, are shown in Table 4. For some factors the estimated loadings 
agree well with the obtained loadings, while for other factors the discrepancies 
are larger, possibly indicating that the cross-identification of the factors does 
not hold in these cases. In evaluating the results it should also be borne in 
mind that some of the values used in the calculations were estimated from 
other samples rather than obtained from this sample, and also that there was 


*These loadings had been determined for all but one of the experimental tests in 
Guilford, Fruchter, and Zimmerman (5). 
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a larger number of factors in the formula-score battery analysis (due to the 
larger variety of tests included) than was brought out in the analyses of the 
intercorrelations of the rights and wrongs scores. 


TABLE 4 
Comparison of the Obtained Factor Loadings of the Formula Scores 
with Loadings Predicted from the Factor Content of the Rights and Wrongs Scores 











Estimated Obtained 
Formula- Formula- 
Score Score 
Test Factor Loading Loading* 
1. Map Memory Visual memory .42 .50 43 
Paired-associates 
memory .42 .16 .26 
7. Visualization of Spatial orientation .O7 .59 .58 
Maneuvers, Form C Visualization .51 44 .26 
11. Spatial Visualization I Visualization .56 .63 .60 
14. Aerial Orientation Spatial orientation 44 .61 .62 
19. Plane Name Memory Paired-associates 
memory .O4 .16 .12 





*The loading on the left is from solution I, and the loading on the right is from solution II in J.P. 
Guilford, B. Fruchter, and W. S. Zimmerman, (5). 


If the rights and wrongs scores of a test are relatively independent and 
have their loadings largely on different factors, there are empirical methods 
for determining useful scoring-formula weights. Guilford and Michael (4) 
have developed formulas to determine the weights which, if applied to separate 
scores, would maximize the variance on a desired factor, minimize the variance 
on an undesired factor, or establish a specified ratio between them. Applying 
their formula (1) to the Map Distance test indicates that the scoring formula 
which would maximize the length-estimation variance of that test is (S = 
R + W). The loadings of this score as estimated by the formula is .67 on the 
length-estimation factor and .15 on the visualization factor, whereas the 
loading of the empirically-derived, optimally-valid scoring formula (S = 
R — 3W + 40) is reported to be .38 on the visualization factor and .30 on 
the length-estimation factor (cf. 3, 458). This test had been constructed 
because of interest in its length-estimation variance. Actually, a good measure 
of length-estimation ability had been constructed. Choice of a scoring formula 
that did not maximize the length-estimation variance, however, made it 
appear to be another test of just moderate validity with its major variance 
on the visualization factor which, although valid for the criterion then under 
consideration, is better measured by other tests. 

Another example of the importance of proper weighting of speeded tests 
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can be given in connection with the Estimation of Length test. Estimating 
line lengths seems to enter into many perceptual and some visualization tasks. 
This test was constructed on the hypothesis that if the interpretation of the 
function represented by this factor is correct, the test should be a relatively 
pure measure of that function. It is a five-alternative multiple-choice test, 
and the conventional a priori weighing given the wrongs would be —.25. 

Applying the formula for maximizing the desired variance to the data 
of the Estimation of Length test indicates that the scoring formula (S= 
R + .75W) would maximize the length-estimation variance and give the 
most useful information from the scores on this test. Application of the 
correlation-of-weighted-sums formula indicates that this weighting would 
yield a loading of .70 on the length-estimation factor, whereas the a priori 
weighting (S = R —.25W) gives an estimated loading of .40 on the length- 
estimation factor. Thus, with the optimal weighting, more than three times as 
much of the desired variance would be obtainable. 
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